Logo

The Data Daily

Can Engineers Help Fill the Data Scientist Gap? - Data Economy

Can Engineers Help Fill the Data Scientist Gap? - Data Economy

Data Economy
Can Engineers Help Fill the Data Scientist Gap?
By Guest Contributor Published: 14:53, 3 April, 2018 Updated: 14:53, 3 April, 2018
Other
by Seth DeLand, Product Manager, MathWorks
The European Commission has identified the need for 346,000 more European data scientists by 2020. Therefore, amid the digital skills gap in the UK, it is no surprise that data analysis and data scientists are in high demand, but there aren’t enough people with the knowledge to fill these roles.
Companies are looking for data scientists who have computer science skills, knowledge of statistics, and domain expertise relevant to their specific business problems. These types of candidates are proving elusive, but companies may find success by focusing on the latter.
This third skill – domain expertise about the business – is often overlooked. Domain expertise is required to make judgement calls during the development of an analytic model. It enables one to distinguish between correlation and causation, between signal and noise, between an anomaly worth further investigation and “oh yeah, that happens sometimes”.
Domain knowledge is hard to teach: It requires on-the-job experience, mentorship, and time to develop. This type of expertise is often found in engineering and research departments that have built cultures around understanding the products they design and build. These teams are intimately familiar with the systems they work on.
They often use statistical methods and technical computing tools as part of their design processes, making the jump to the machine learning algorithms and big data tools of the data analytics world manageable.
With data science emerging across industries as an important differentiator, these engineers with domain knowledge need flexible and scalable environments that put the tools of the data scientist at their fingertips.
Depending on the problem, they might need traditional analysis techniques such as statistics and optimisation, data-specific techniques such as signal processing and image processing, or newer capabilities such as machine learning algorithms.
The cost of learning a new tool for each technique would be high, so having these tools together in one environment becomes very important.
So, a natural question to ask is: How can newer techniques like machine learning be made accessible to engineers with domain expertise?
The goal of machine learning is to identify the underlying trends and structure in data by fitting a statistical model to that data.
When working with a new dataset, it’s hard to know which model is going to work best; there are dozens of popular models to choose from (and thousands of less-popular choices). Trying and comparing several different model types can be very time-consuming using “bleeding edge” machine learning algorithms.
Each of these algorithms will have an interface that is specific to the algorithm and preferences of the researcher who developed it. Significant amounts of time will be required to try many different models and compare approaches.
One solution is an environment that makes it easy for engineers to try the most-trusted machine learning algorithms and that encourages best practices such as preventing over-fitting. For example, the process engineers at a large semiconductor manufacturing company were considering new ways to ensure alignment between the layers on a wafer.
They came across machine learning as a possible way to predict overlay between layers but, as process engineers, they didn’t have experience with this newer technique. Working through different machine learning examples, they were able to identify a suitable machine learning algorithm, train it on historical data, and integrate it into a prototype overlay controller.
Using the latest tools meant these process engineers had the ability to apply their domain expertise to build a model that can identify systematic and random errors that might otherwise go undetected.
According to Gartner , engineers with the domain expertise “can bridge the gap between mainstream self-service analytics by business users and the advanced analytics techniques of data scientists.
“They are now able to perform sophisticated analysis that would previously have required more expertise, enabling them to deliver advanced analytics without having the skills that characterise data scientists.”
As technology continues to evolve, organisations must quickly ingest, analyse, verify, and visualise a tsunami of data to deliver timely insights to capitalise on business opportunities.
Instead of spending time and money searching for those elusive data scientists, companies can stay competitive by enabling their engineers to do data science with a flexible tool environment that enables engineers and scientists to become data scientists – opening up access to the data for more people.
More On:
How enterprises are evolving their IT practices
By Guest Contributor Published: 10:00, 22 March, 2018 Updated: 16:07, 21 March, 2018
Other
by Mike Bushong, Vice President, Enterprise and Cloud Marketing, Juniper Networks
A recent PwC study of enterprise IT buyers yielded insights about how enterprises are evolving their IT practices. Not surprisingly, the study found that IT leaders are planning to migrate more workloads to the cloud in the coming three year period, with strong movement across all application types. Perhaps, more interestingly, the transition to cloud seems to correspond with a meaningful shift in buying criteria.
 
Security requires all hands on deck
The survey found that the top buying criteria was security. While seeing security at the top of a list of purchase considerations is not surprising in a general sense, it is worth noting that this particular research was focused on data center networking.
In enterprise IT, networking and security have historically been separate domains. Seeing security rise to the top of networking purchase criteria is a remarkable shift in priorities, likely driven by the high-profile security incidents across the industry at large, and the perception that they are becoming both more frequent and more damaging.
The implications for enterprises could be profound. The blurring of lines between networking and security teams means that both security and operational practices will need to converge. As boundaries between teams fade, how solutions are architected and products evaluated will change. Visibility and automation tools will necessarily need to span more than just the security or the network devices.
For networking in particular, it suggests that the network must play an active role in strengthening an enterprise’s security posture. It seems inevitable that the network will serve as a source of information, critical in identifying and ultimately containing attacks within the enterprise.
 
Agility is the new TCO
PwC’s research also showed that automation has passed TCO in terms of importance. Strictly speaking, the study had automation as the number two criterion, but when taken with priorities four and five (agility and technology innovation, respectively), it is clear that the speed of business is higher priority than the cost of business.
For companies that still see IT fundamentally as a cost center servicing the business, these conclusions might not ring true. But, for enterprises that are navigating their own digital transformations and coping with the adoption of new technology, these findings support the view that IT is increasingly a key strategic enabler.
As with security, the implications here are important. To unlock the power of automation, companies need to invest both in technology and people. Skills and process must accommodate an emerging set of tools that help accelerate innovation within enterprise IT. Enterprises should develop open source practices and broaden their solutions aperture to include new classes of products that promote heavy automation.
 
Cloud and multicloud as drivers of change
Going back to the opening PwC finding, all of this change is going to happen in the context of cloud and multicloud. As enterprises move workloads to public and private cloud, they will find natural security and operational inflection points.
While enterprises might consider their strategy as related to cloud, the reality is that most enterprises will find their futures are more likely to be multicloud. Whether it is supplier management leading to multiple cloud vendors, or security requiring on-premises data storage, or simply the acknowledgement that migrations will leave companies straddling both private and public cloud, the future is decidedly multicloud.
This provides the context in which the security and automation practices must change. In a multicloud world, the boundaries of security and automation do not end at the physical walls of the data center. Rather, they extend out to the public cloud and even to the cloud on-ramps that exist in both the campus and branch environments.
This requires explicit end-to-end planning, which means enterprises will need to change their architectural practices even more as they bring other teams into the process.
 
Don’t break the bank just yet
Of course, while the priorities have shifted a bit, it is not the case that cost is no longer a factor. PwC respondents indicate that TCO is still the number three priority, which means that all of this change has to happen while carefully considering the budget implications.
Enterprise IT will need to determine how to simultaneously support existing infrastructure while developing new disciplines for the go-forward models. Navigating these waters will prove difficult, and indeed, some companies will not survive the transition.
The key to balancing change and cost is leveraging natural expansion and refresh opportunities to ensure that there is constant movement forward.
And, for companies for whom the strategy has not yet considered the prospect of multicloud, they will do well to be certain that every step forward comes without unintentionally closing a potential future door. More simply, meet today’s needs while also becoming more multicloud-ready.
More On:
Changing the mindset: pushing out the Server Huggers and Country Huggers.
By João Marques Lima Published: 08:00, 20 March, 2018 Updated: 02:45, 20 March, 2018
I can remember a discussion with a leading co-location player several years ago when they were trying to work out who they ‘lost’ business against, tells Steve Wallage, MD at BroadGroup Consulting.
 
All the analysis seemed to show that they were rarely losing against direct competitors.
This article originally appeared in the last issue of the Data Economy Magazine. Click here to read more.
Then they realized that their biggest ’competitor’ was the in-house option.
A great example of this was a recent UK survey of UK councils (local government) which, on the back of central Government initiatives on data centre and IT sharing, asked about their willingness to work with other councils and outsources.
Around 90% admitted that they would rather do things the way they had already done them, and manage everything themselves.
This concept of server-hugging has been expanded to country-hugging.
Adding concerns around data privacy, security and regulations, to ensure data is kept in-country.
The rise in demand in Frankfurt, and desire for German users to have a local data centre, was such that one collocation player suggested putting a statue of Edward Snowden in their office.
If they knew the architects of GDPR , they could be immortalized in marble as well!
The level of feeling in Germany can be seen in the 2016 Bitkom survey which asked users their key criteria in selecting cloud services, with any option possible from price to performance – having a data centre in Germany was the key issues.
 
So how to get around the issue?
First, is to accept that clearly some data and data centre requirements need to be in country, and latency is key.
Some of our recent user research has also found that there is still a large group of users who want, or their end customer wants, to be able to easily visit the data centre.
“It still amazes me how many colocation proposals we see that focus on the specification of the site, and space and power options. Many users want to know about the benefits for their specific business. ” Steve Wallage
There may also be other factors from regulatory to logistical.
 
Second, is to get customers to start looking at Total Cost of Ownership and particularly understanding the cost of power, which can be 30-50% of data centre OPEX.
Although telecoms can clearly be cheaper in somewhere like Amsterdam, then this needs to be considered in the overall calculation.
Some co-location companies have found selling to the CFO in this regard a much more successful strategy.
 
Third, is to look at the disaggregation of applications and not a ‘one size fits all’ data centre strategy. We are seeing banks particularly ahead in thinking about the different data centre requirements by application.
However, across many vertical markets, there are multiple applications which fit well into a lower resiliency, and often lower latency, environment.
 
Fourth, is the execution and marketing.
It still amazes me how many co-location proposals we see that focus on the specification of the site, and space and power options. Many users want to know about the benefits for their specific business, and how they can be supported and helped in their cloud strategies.
 
Fifth, is the breadth and reach.
Returning to the story in the first paragraph, what was also strange to the co-location provider was the sheer diversity of partners and ‘advisors’ used by customers.
These ranged from telcos to property firms to SIs and IT firms to outsourcing and cloud companies.
They are also often local companies, or larger companies with a local and regional presence.
 
Sixth, and final point, is the benefit of the new and advanced data centre.
The large European hubs have got great existing eco-systems and infrastructure, but often the data centres that house it were built in the late 1990s.
A question we are often asked by investors is when technology obsolescence will hit this market, and when will demand and pricing at older sites start to really decline.
To many users, they look at the operating record, the eco-systems, the specification and security, and often conclude that older data centres remain entirely fit for purpose.
The key is to showcase the benefits of newer sites from efficiency to flexibility, and what that actually means for users.
 
As is so often the case, the reaction varies by customer, but we are seeing some users who really value newer data centres.
For example, a Chinese delegation recently asked us about which data centres and co-location companies were worth visiting in Europe – their first criteria was that if the data centre was more than two years old, it would not be suitable for them.
More On:
Top tips for choosing a colocation partner
By Guest Contributor Published: 04:30, 13 March, 2018 Updated: 21:32, 12 March, 2018
by Jon Lucas, Director, Hyve Managed Hosting
When you’re choosing where to host your IT infrastructure there’s a lot to take into consideration. Security, redundancy, connectivity.
Colocation gives you complete control over your hardware, whilst your hosting provider takes care of physical security and network uptime. But what makes a good partner for the job?
Here’s five top things to look for:
1. Location, location, location
A reliable data centre should be focused on uptime and location should play a critical part in the decision-making process. Working with a colocation partner that strives for as close to 100% availability as possible should be a key priority.
Tier 3 data centres boast uptime of 99.98% or more, which equates to no more than 1.6 hours of downtime per year.
Important to consider too is that the closer the data centre, the cheaper your networking costs, so choosing a partner who can guarantee a prime, central location with great connectivity is often your best bet.
2. Security and redundancy
As with any aspect of IT, security should be a high priority for your colocation provider. In the physical sense, this means dedicated facilities teams. 24/7 monitoring of access points, power supplies – the lot.
Additional resources in case of an emergency are also a sign of a dedicated partner that goes the extra mile – this means back up power supplies that offer additional redundancy, and an unlimited power supply (UPS).
3. Strength of communication
Choosing a data centre is not just about the physical characteristics of the facility – it’s about the people behind it. A good partnership is all about communication, and when it comes to data, it’s also vital for peace of mind.
Whether you want to amend your agreement, report an issue or simply ask for advice, expect nothing less than 24/7, 365 support from a partner’s technical team that knows your company inside out.
4. Ability to scale
Often in your own data centre, adding server capacity means physically building for more space, power and cooling. A good colocation partner will offer the option to increase your space as standard, leaving you much more agile for expansion as your company grows. Equally, you only pay for what you use, so there’s no risk of wasted resources.
5. Flexibility and transitional services
IT moves fast. Priorities change. Partners should be flexible enough to adapt to evolving needs and offer additional services if requirements call for it.
This might mean offering a managed service model that provides consultancy over your configuration, or offering to make changes to your master service agreement (MSA) mid-way through your contract. Taking advantage of your hosting partner’s managed services means that your infrastructure still runs smoothly without you having to take responsibility for it.
Essentially, any organisation who values the uptime of their IT services will consider the above before moving large amounts of equipment and vital data to a new location. Doing your research will only help save money in the long run and, while this isn’t a completely rigid set of guidelines, but it’s a pretty damn good place to start.
More On:
Businesses can’t afford to compromise at the Edge
By Guest Contributor Published: Updated: 20:29, 12 March, 2018
Other
by Victor Avelar, Director and Senior Research Analyst, Schneider Electric , Data Center Science Center
The data centre industry is fast being forced to address a rapidly changing IT environment driven by the proliferation of connected devices and demand for content such as movies, gaming and applications.
The amount of data circulating through the world’s networks is increasing at a staggering rate. By 2020 Cisco estimates that some 50bn devices will be attached to the network. That doesn’t include the additional 20.8bn items already comprising the Internet of Things (IoT) and using sensors to aggregate and share data about their status.
While we are yet to see these predictions come to fruition, the question remains, should the growth in digital traffic nearly triple in the next five years, how would data centres cope with this demand?
One inevitable consequence is that more compute, storage, and connectivity will move to the Edge of the network, closer to the point of use and where the data is both created and consumed. Latency alone will require data centres to become more distributed, providing a gateway to and from the massive centralised hyperscale facilities to smaller, more localised operations, where response times to end-users need to be much faster.
However, the trade off in scale and speed cannot be achieved at the expense of security, availability, and data integrity. On the contrary, many of the services provided by Edge data centres consider these to be three essential attributes on which no compromise may be tolerated.
For example, connectivity to applications and business-critical services hosted in the Cloud are dependent on the Internet but if your company’s revenue depends on the uptime of cash registers at your retail locations, you wouldn’t want them to depend on that internet connectivity.  Instead you want each retail location to have a standardised, pre-configured, and pre-tested, Edge solution which ensures they perform as expected regardless of its connection to a centralized data centre.
Another example is that of driverless cars, which will certainly require more local processing and compute power to achieve a rapid and latency-free speed of response – in addition to larger volumes of storage, which will ensure data from both vehicles and journeys alike is backed up and well documented.
And finally, there are the Internet-based content platforms such as Netflix, YouTube, and Prime, which depend on smaller, edge data centres to provide a fast and resilient streaming service. Security remains another key consideration here; we cannot imagine a scenario where customers would trust or share their personal data with a platform or service that was not highly fault tolerant and resilient to malicious attacks.
In all of these secnarios, it is the combination of data centre infrastructure, advanced monitoring, and management software that protects two of the things we value most within the digital domain; our privacy and quality of service.
However, it is not only these services or applications that are driving sophistication at the edge. Data sovereignty is fast becoming a critical issue for any business thanks to the General Data Protection Regulations (GDPR) about to come into force and will clearly assign legal responsibility for data privacy and integrity across all parts of the data value chain.
The consequences of these developments, and many others, will require that data centres, whatever their size and wherever their location, will have to become more cost-effective, more efficient, more available, secure, and faster to deploy.  The best way to achieve this is through standardisation, integration, and testing between vendor and product alike.
During a recent presentation given at DataCloud UK, Schneider Electric estimated that in future, the time taken to design a data centre may be halved, with new facilities delivered to customers up to 60% faster. Standardization and reference designs will become key to this process, but such improvements can only be achieved by a concerted effort from partners throughout the entire IT, networking, and infrastructure industries.
Fifth generation (5G) wireless networks will help to make available the necessary data transfers, and inside the data centre itself, increased interoperability between the various vendors and product offers will help both to reduce costs and footprint, whilst increasing reliability and speed of deployment. Such developments are already being driven by OEMs including HP, Cisco, and Nutanix as well as data centre infrastructure specialists like Schneider Electric.
But what’s most clear is that software will be the key enabler tying all these disparate elements together. Schneider Electric’s EcoStruxure for Data Centers™ is a framework for achieving just that, at three levels – Connected products, Edge control, and Apps, analytics & services.
Edge control, at its foundation, is sensor technology; the philosophy underpinning the IoT, which has long been applied to data centre equipment. Increasingly connected products, including power distribution, UPS, cooling, and rack technology, will make use of both smart sensors and machine-to-machine (M2M) communication to provide ever more accurate and timely information about their status, including capacity and maintenance issues, delivered automatically to a central management console.
Edge control is comprised of software in the form of Data Centre Infrastructure Management (DCIM) and Building Management Systems (BMS). Both have been in use for many years but in the future, they will become even more interconnected, being used to monitor, manage, and improve the efficiency of infrastructure, whilst assisting in capacity planning, fault recognition, and proactive maintenance.
At the top of the EcoStruxure platform are the Apps, analytics & services, which now deliver detailed information about critical infrastructure to smart devices via the Cloud. Such apps can alert operations management and service personnel of issues requiring urgent attention at any time of day or night, allowing greater operational flexibility and interactive communications. These analytics will help to detect issues that improve the operation and reliability of data centres.
The data centre of the future will power applications of such complexity and in such competitive and fast-moving markets, that the issues of efficiency, resiliency, security, speed of deployment and cost will be considered with equal diligence.
Trade-offs between them, will no longer do.
More On:

Images Powered by Shutterstock