Data Engineer Resume Charlotte, NC - Hire IT People

SUMMARY

Data Engineer professional with 8+ years of combined experience in the fields of Data Engineer, Big Data implementations and Spark technologies.
Experience in Big Data ecosystems using Hadoop, Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Airflow, Snowflake, Teradata, Flume, Kafka, Yarn, Oozie, and Zookeeper.
High Exposure on Big Data technologies and Hadoop ecosystem, In - depth understanding of Map Reduce and Hadoop Infrastructure.
Expertise in writing end to end Data processing Jobs to analyze data using MapReduce, Spark and Hive.
Experience with Apache Spark ecosystem using Spark-Core, SQL, Data Frames, RDD's and noledge on Spark MLLib.
Experienced in data manipulation using python for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas for data analysis and numerical computations.
A solid experience and understanding of designing and operationalization of large-scale data and analytics solutions on Snowflake Data Warehouse.
Developing ETL pipelines in and out of data warehouse using a combination of Python and SnowSQL.
Experience in extracting files from MongoDB through Sqoop and placed in HDFS and processed.
Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
Implemented Cluster for NoSQL tool HBase as a part of POC to address HBase limitations.
Strong Knowledge on architecture and components of Spark, and efficient in working with Spark Core.
Strong noledge of Hive analytical functions, extending Hive functionality by writing custom UDFs.
Expertise in writing Map-Reduce Jobs in Python for processing large sets of structured, semi-structured and unstructured data sets and stores them in HDFS.
Good understanding of data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
Used Amazon Web Services Elastic Compute Cloud (AWS EC2) to launch cloud instances.
Hands on experience working Amazon Web Services (AWS) using Elastic Map Reduce (EMR), Redshift, and EC2 for data processing.
Hands on experience in SQL and NOSQL database such as Snowflake, HBase, Cassandra and MongoDB.
Hands on experience in setting up workflow using Apache Airflow and Oozie workflow engine for managing and scheduling Hadoop jobs.
Experience in data warehousing and business intelligence area in various domain.
Created tableau dashboards designing with large data volumes from data source SQL server.
Extract, Transform and Load (ETL) source data into respective target tables to build the required data marts.
Active involvement in all scrum ceremonies - Sprint Planning, Daily Scrum, Sprint Review and Retrospective meetings and assisted Product owner in creating and prioritizing user stories.
Strong experience in working with UNIX/LINUX environments, writing shell scripts.
Worked with various formats of files like delimited text files, clickstream log files, Apache log files, Avro files, JSON files, XML Files.
Strong skills in analytical, presentation, communication, problem solving with the ability to work independently as well as in a team and had the ability to follow the best practices and principals defined for the team.

TECHNICAL SKILLS

Hadoop/Spark Ecosystem: Hadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Oozie, Zookeeper, SparkAirflow, MongoDB, Cassandra, HBase, and Storm.

Hadoop Distribution: Cloudera distribution and Horton works.

Programming Languages: Scala, Hibernate, JDBC, JSON, HTML, CSS, SQL, R, Shell Scripting

Script Languages: JavaScript, jQuery, Python.

Databases: Oracle, SQL Server, MySQL, Cassandra, Teradata, PostgreSQL, MS Access, Snowflake, NoSQLDatabase (HBase, MongoDB).

Operating Systems: Linux, Windows, Ubuntu, Unix

Web/Application server: Apache Tomcat, WebLogic, WebSphere Tools Eclipse, NetBeans

Data Visualization Tools: Tableau, Power BI, SAS, Excel, ETL

OLAP/Reporting: SQL Server Analysis Services and Reporting Services.

Cloud Technologies: MS Azure, Amazon Web Services (AWS).

Machine Learning Models: Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbor (KNN), TEMPPrincipal Component Analysis, Linear Regression, Naïve Bayes.

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Data Engineer

Responsibilities:

Responsible for the execution of big data analytics, predictive analytics, and machine learning initiatives.
Implemented a proof of concept deploying dis product in AWS S3 bucket and Snowflake.
Utilize AWS services with focus on big data architect /analytics / enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
Developed Scala scripts, UDF's using both data frames/SQL and RDD in Spark for data aggregation, queries and writing back into S3 bucket.
Experience in data cleansing and data mining.
Wrote, compiled, and executed programs as necessary using Apache Spark in Scala to perform ETL jobs with ingested data.
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
Wrote Spark applications for data validation, cleansing, transformation, and custom aggregation and used Spark engine, Spark SQL for data analysis and provided to the data scientists for further analysis.
Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3, Teradata and snowflake.
Designed and Developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying transformations on it.
Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.
Created scripts to read CSV, json and parquet files from S3 buckets in Python and load into AWS S3, DynamoDB and Snowflake.
Implemented AWS Lambda functions to run scripts in response to events in Amazon DynamoDB table or S3 bucket or to HTTP requests using Amazon API gateway
Migrated data from AWS S3 bucket to Snowflake by writing custom read/write snowflake utility function using Scala.
Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket.
Profile structured, unstructured, and semi-structured data across various sources to identify patterns in data and Implement data quality metrics using necessary queries or python scripts based on source.
Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags to run the Airflow.
Created DAG to use the Email Operator, Bash Operator, and spark Livy operator to execute and in EC2 instance.
Deploy the code to EMR via CI/CD using Jenkins
Extensively used Code cloud for code check-in and checkouts for version control.

Environment: Agile Scrum, MapReduce, Snowflake, Pig, Spark, Scala, Hive, Kafka, Python, Airflow, JSON, Parquet, CSV, Code cloud, AWS.

Confidential, Rochester, MN

Data Engineer

Responsibilities:

Worked on designing and developing the Real - Time Tax Computation Engine using Oracle, StreamSets, Kafka, Spark Structured Streaming and MySQL
Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
Involved in ingestion, transformation, manipulation, and computation of data using StreamSets, Kafka, MySQL, Spark
Involved in data ingestion into MySQL using Kafka - MySQL pipeline for full load and Incremental load on variety of sources like web server, RDBMS, and Data API’s.
Worked on Spark Data sources, Spark Data frames, Spark SQL and Streaming using Scala.
Worked extensively on AWS Components such as Elastic Map Reduce (EMR), Elastic Compute Cloud (EC2), Simple Storage Service (S3)
Experience in developing Spark application using Scala SBT
Experience in integrating Spark-MySQL connector and JDBC connector to save the data processed in Spark to MySQL.
Responsible for creating tables and MySQL pipelines which are automated to load the data into tables from Kafka topics
Performed a POC to check the time taking for Change Data Capture (CDC) of oracle data across Stream, StreamSets and DB Visit
Expertise in using different file formats like Text files, CSV, Parquet, JSON
Experience in custom compute functions using Spark SQL and performed interactive querying.
Responsible for masking and encrypting the sensitive data on the fly
Responsible for creating multiple applications for reading the data from different Oracle instances to Kafka topics using Stream
Responsible for setting up a MySQL cluster on AWS EC2 Instance
Experience in Real time streaming the data using Spark with Kafka.
Responsible for creating a Kafka cluster using multiple brokers.
Experience working on Vagrant boxes to setup a local Kafka and StreamSets pipelines

Environment: Spark 2.2, Scala, Linux, MySQL 5.8, Kafka 1.0, Stream, StreamSets, Spark SQL, Spark Structured Streaming, AWS EC2, EMR, IntelliJ, SBT, Git, Vagrant.

Confidential, MI

Big Data Engineer

Responsibilities:

Extensively involved in Installation and configuration of Cloudera Hadoop Distribution.
Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala
Developed spark applications for performing large scale transformations and denormalization of relational datasets.
Has real-time experience of Kafka-Storm on HDP 2.2 platform for real time analysis.
Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
Created reports for the BI team using Sqoop to export data into HDFS and Hive.
Performed analysis on the unused user navigation data by loading into HDFS and writing MapReduce jobs. The analysis provided inputs to the new APM front end developers and lucent team.
Loading the data from multiple Data sources like (SQL, DB2, and Oracle) into HDFS using Sqoop and load into Hive tables.
Created HIVE Queries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables.
Developed Complex HiveQL's using SerDe JSON
Created HBase tables to load large sets of structured data.
Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports
Performed Real time event processing of data from multiple servers in the organization using Apache Storm by integrating with apache Kafka.
Managed and reviewed Hadoop log files.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Worked on PySpark APIs for data transformations.
Data ingestion to Hadoop (Sqoop imports). To perform validations and consolidations for the imported data.
Extending Hive and Pig core functionality by writing custom UDF's for Data Analysis.
Upgraded current Linux version to RHEL version 5.6
Expertise in hardening, Linux Server, and Compiling, Building, and installing Apache Server from sources with minimum modules
Worked on JSON, Parquet, Hadoop File formats.
Worked on different Java technologies like Hibernate, spring, JSP, Servlets and developed code for both server side and client side for our web application.
Used Git hub for continuous integration services.

Environment: Agile Scrum, MapReduce, Hive, Pig, Sqoop, Spark, Scala, Oozie, Flume, Java, HBase, Kafka, Python, Storm, JSON, Parquet, GIT, JSON SerDe, Cloudera.

Confidential

Data Analyst

Responsibilities:

Understand the data visualization requirements from the Business Users.
Writing SQL queries to extract data from the Sales data marts as per the requirements.
Developed Tableau data visualization using Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.
Designed and deploy rich Graphic visualizations with Drill Down and Drop-down menu option and Parameterized using Tableau.
Created action filters, parameters, and calculated sets for preparing dashboards and worksheets in Tableau.
Explored traffic data from databases connecting them with transaction data, and presenting as well as writing report for every campaign, providing suggestions for future promotions.
Extracted data using SQL queries and transferred it to Microsoft Excel and Python for further analysis.
Data Cleaning, merging, and exporting the dataset was done in Tableau Prep.
Data processing and cleaning techniques carried out to reduce text noise, reduce dimensionality in order to improve the analysis.

Environment: Python, Informatica v9.x, MS SQL SERVER, T-SQL, SSIS, SSRS, SQL Server Management Studio, Oracle, Excel.

Confidential

Data Analyst

Responsibilities:

Processed data received from vendors and loading them into the database. The process was carried out on a weekly basis and reports were delivered on a bi-weekly basis. The extracted data had to be checked for integrity.
Documented requirements and obtained signoffs.
Coordinated between the Business users and development team in resolving issues.
Documented data cleansing and data profiling.
Wrote SQL scripts to meet the business requirement.
Analyzed views and produced reports.
Tested cleansed data for integrity and uniqueness.
Automated the existing system to achieve faster and accurate data loading.
Generated weekly, bi-weekly reports to be sent to client business team using business objects and documented them too.
Learned to create Business Process Models.
Ability to manage multiple projects simultaneously tracking them towards varying timelines effectively through a combination of business and technical skills.
Good Understanding of clinical practice management, medical and laboratory billing, and insurance claim with processing with process flow diagrams.
Assisted QA team in creating test scenarios that cover a day in the life of the patient for Inpatient and Ambulatory workflows.

Environment: SQL, data profiling, data loading, QA team, Tableau, Python, Machine Learning models.

We provide IT Staff Augmentation Services!

Data Engineer Resume

Charlotte, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship