Big Data Engineer Resume
Bentonville, AR
SUMMARY
- Around 14 years of experience in IT industry, which encompasses analysis, design, and development of IT Solutions for various companies in Insurance, Telecom and Automobile Industries .
- Strong working experience with Big Data and Hadoop Ecosystems.
- Experience in data extraction and data integration from different data sources into Hadoop Data Lake by creating ETL pipelines Using Spark, Sqoop, Pig, and Hive.
- Good understanding of Hadoop architecture and various components of Hadoop ecosystem such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MapReduce & YARN.
- Proficient in developing efficient SQL queries on a variety of databases like Oracle, MySQL, PostgreSQL, DB2.
- Good experience in writing Spark programs in Python and working over distributed datasets across Hadoop Cluster.
- Experience in leveraging Alteryx for Extract Transform Load (ETL) projects.
- Fair knowldge in developing Machine Learning and Deep Learning models in Python.
- Extensive development experience in Oracle PL/SQL for implementing the various business functionalities across various domains.
- Experience in working in Agile and Waterfall Software development methodologies.
- Understanding of various services provided by AWS (Amazon Web Services) like EC2,S3, VPC, IAM, Cloud Front, Cloud Watch, Cloud Formation, Glacier, RDS, Route 53, SNS, SQS, Elastic Cache, EMR, Kinesis, Redshift.
- Experience in developing applications in Oracle Applications 11.5 Order Management Advanced Pricing Module.
- Fair understanding in developing dashboards in Qliksense, Tableau.
- Fair understanding of Microsoft Dotnet Technologies.
TECHNICAL SKILLS
- Big Data Platforms: Hortonworks
- Big Data Technologies: HDFS, Pig, Hive, Oozie, Flume, Spark, Hbase, HUE,Ambari
- Data Science Tools: Python,R,Tensorflow,Keras,Jupyter,Alteryx
- Cloud Computing (Amazon Web Services): EC2,S3, VPC, IAM, Cloud Front, Cloud Watch, Cloud Formation, Glacier, RDS, Route 53, SNS, SQS, Elastic Cache, EMR, Kinesis, Redshift, Dynamodb, Aurora
- Data Reporting and Visualization: Tableau, Qliksense, QlikView
- Database: Oracle, MySQL, PostgreSQL, DB2, SQLServer
- Web Technologies: HTML, ASP, ASP.Net, XML, JavaScript, VBScript
- Languages: SQL, PL/SQL, C, C++, C#, Visual Basic
- Development Tools: RStudio, Pycharm, Squirrel, PgAdmin, WinSCP, Toad, PLSQLDeveloper, SQLPlus, Visual Studio, Microsoft VSS
- ERP: Oracle Applications 11.5 Order Management Advanced Pricing
PROFESSIONAL EXPERIENCE
Confidential, Bentonville,AR
Big Data Engineer
Environment: Hortonworks Hadoop, Hive, HDFS, Sqoop
Development Tools: Teradata SQL Assistant,Putty, Automic Scheduler,Jira,Github
Operating System: Unix,Windows 10
Responsibilities:
- Assembled large data sets that meet functional and non - functional business requirements.
- Developed optimal data pipeline using Hadoop BigData tools such as Hive, Pig, Spark and Automic Scheduler that serve the business use case.
- Designed big data solutions and developed SQL code for data extraction, transformation, loading, and reconciliation from various datasources like Teradata, SQL Server etc into Hadoop data warehouse to help generate business insights and address reporting needs.
- Developed processes in Python to build processes that automate manual efforts and speed up the overall development cycle.
- Implemented data quality checks and frameworks in Hadoop to identify and avoid any issues with data.
- Maintaining architectural principles and coding standards across the code and project lifecycles, including validating whether the data usage is as per the data security principles and that the data manipulation is built as per the requirements.
- Worked with product and cross functional teams to understand the business requirements on new features being implemented on Confidential .com and translate them into technical documentation.
- Provided on-call production support by troubleshooting any production issues that might arise, such as batch jobs failure due to bad data or duplicate records, and release timely fixes to maintain service level agreement.
- Provided technical expertise and guidance in the form of requirement analysis and design and code review to the offshore team on various projects.
Confidential, Dearborn, MI
Data Engineer
Environment: Hortonworks Hadoop, Hive, HDFS, Sqoop
Development Tools: Teradata SQL Assistant,Putty, Automic Scheduler,Jira,Github
Operating System: Unix,Windows 10
Responsibilities:
- Utilized Apache Spark with Python to develop and execute Big Data Analytics applications using SparkSQL.
- Developed Sqoop and Hive SQL jobs executed via shell script and scheduled them through Oozie workflows.
- Developed new queries and Alteryx flows for extracting, transforming and analyzing data to address project requirements.
- Developed Dashboards in Tableau for Project Task monitoring.
- Data steward to 50+ Dealer and Customer data sources. Single point of contact and SME for all operational aspects (inventory, data landing, profiling and issue management).
- Facilitated collaborative trouble shooting sessions with global cross functional teams (multiple DSC IT teams, data governance, analytical and business partners.
- Combined my domain knowledge of data with technical skills to deliver customized data products.
- Partnered with Data Governance to secure the sensitive data and Ensure adherence to OGC and regional regulations.
Confidential, Auburn Hills, MI
Data Engineer
Environment: Hortonworks Hadoop, Hive, HDFS, Sqoop
Development Tools: Teradata SQL Assistant,Putty, Automic Scheduler,Jira,Github
Operating System: Unix,Windows 10
Responsibilities:
- Worked with Project Manager, Business Leaders and Technical teams to finalize requirements and create solution design & architecture.
- Architect the data lake by cataloging the source data, analyzing entity relationships, and aligning the design as per performance, schedule & reporting requirements.
- Design and Develop Hadoop ETL solutions to move data to the data lake using big data tools like Sqoop, Hive, Spark, HDFS etc.
- Developed an automated process in python to create Unix Shell scripts to perform Hadoop ETL functions like Sqoop, create external/internal Hive tables, initiate HQL scripts etc.
- Design and Develop Spark code using Python programming language & Spark SQL for high speed data processing to meet critical business requirements.
- Created automated workflows using HUE Oozie and monitored the same to debug any issue pertaining to the failure of a specific job.
- Developed transformations over large datasets in SparkSQL to be utilized by Qlikview applications.
- Created dashboards in Qliksense from the data gathered in the HDFS Data Lake.
Confidential
Senior Database Developer
Environment: Hortonworks Hadoop, Hive, HDFS, Sqoop
Development Tools: Teradata SQL Assistant,Putty, Automic Scheduler,Jira,Github
Operating System: Unix,Windows 10
Responsibilities:
- Extensively developed Stored Procedures, PL/SQL Packages, Triggers, Functions and Exception handling to implement the functionalities as was expected by the business requirements.
- Performance Tuning of troublesome queries using features like collections, explain plan, tkprof, indexes and hints.
- Interaction with the development team, and FSD owner for understanding the requirement.
- Impact Analysis of the existing system with respect to new functionality.
- Database and oracle apps development according to new requirement includes database code and oracle apps setups.
- Maintaining Data Replications through Goldengate or inbuilt Cisco Tool (Cisco Data Bus)
- Unit test case preparation, test data preparation and unit testing .
- Ensure compliance to coding standards and maintaining Project related documentation
- Providing Support to the Quality Assurance Team.
- Tracking and maintaining Weekly Status Report for the project.
- Reviewing and monitoring the work done by the team members.
- Final Deployment and Post production support for enhancements.