We provide IT Staff Augmentation Services!

Data Engineer/ Spark Developer Resume

5.00/5 (Submit Your Rating)

Salt Lake City, UtaH

PROFESSIONAL SUMMARY:

  • Overall 6+ years IT experience with strong emphasis on Design, Development, Implementation, Testing and Development of software applications in Hadoop, HDFS, MapReduce, Hadoop Ecosystem, ETL and RDBMS. Around 3+ years experience on Data Analysis Data mining, Acquisition, Validation, Visualization and discovering meaningful business insights on large data sets of Structured and Unstructured data.
  • Explicitly fashioned in writing SQL queries for various RDBMS such as SQL Server, MySQL, Microsoft SQL, PostgreSQL, Teradata and Oracle
  • Experienced in NoSQL databases such as MongoDB, HBase and Cassandra .
  • Experience on designing and implementing complete end to end Hadoop Infrastructure MapReduce, Spark, HDFS Architecture, Cassandra , HBase, Sqoop, Hive , Pig
  • Having good experience in Hadoop framework and related technologies like HDFS, MapReduce, Pig, Hive, HBase, Spark, ZooKeeper, Kafka, Sqoop and Oozie.
  • Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using Hadoop distributions: Cloudera CDH, Horton Works HDP.
  • Experienced working with Cloud services such as Amazon Web Services ( AWS), Microsoft Azure and Google Cloud Platform (GCP).
  • Experience in visualization tools like, Tableau, Power BI, Qlikview for creating dashboards.
  • Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark Streaming and Spark SQL.
  • Experience in implementing real - time data process and analytics pipelines using KStreams.
  • Good experience in optimizing MapReduce algorithms using Mappers, Reducers, combiners and partitioners to deliver the best results for the large datasets.
  • Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch
  • Good knowledge on OLAP , OLTP , Data Warehousing and Data Ingestion Pipeline .
  • Design and develop scalable, efficient data pipeline processes to handle data ingestion, cleansing, transformation, integration, and validation required to provide access to prepared data sets to cross functional teams.
  • Strong SQL Server programming skills, with experience in working with functions, views, stored procedures, packages and triggers.
  • Expertise in ETL tools like Talend, Informatica, Microsoft SQL Server SSIS.
  • Experienced writing queries on HQL on Hue editor to access data from Hive Data Warehouse.
  • Strong experience in using Excel and MS Access to dump data and analyze based on business needs.
  • Experience writing in house Unix, Linux shell scripts for Hadoop & Big Data Development
  • Worked in container-based technologies like Docker, Kubernetes and Openshift.
  • Experience with collaborative tools such as SharePoint and JIRA.
  • Good Knowledge on Version control systems such as Git, GitHub, bitbucket.
  • Have Knowledge in Salesforce Customer Relation Management (CRM) application and Workday Enterprise resource Planning (ERP) application.
  • Skilled in Microsoft Excel (e.g., formulas, pivot tables, graphing).
  • Extensive knowledge of Normalization and Relational Database Management Systems.
  • Expert in Data Visualization, Reporting and Analysis, Cross Map, Scatter Plots, Pie and Bar Charts, Page Trails, Dual Axis, Pie-Charts, Heat Maps, Bubble Charts, Tree Maps, Funnel Charts, Box Plots, Waterfall Charts, Geographic Visualization.
  • Excellent Software Development Life Cycle (SDLC) with good working knowledge of testing methodologies, disciplines, tasks, resources and scheduling.
  • Experience working in both Waterfall and Agile methodologies.
  • Expert in using Model Pipelines to automate the tasks and put models into production quickly.
  • Robust participation for functioning in fast-paced multi-tasking environment both independently and in the collaborative team.

TECHNICAL SKILLS:

Programming Languages: Python, SQL, Scala, SAS, Java, C#, C, RPA, HTML, CSS

Databases: Teradata, Microsoft SQL Server, MySQL, DB2, PostgreSQL, Oracle

No SQL: HBase, Cassandra, and MongoDB

Graph DB: Neo4j, Amazon Neptune

Big Data: Hadoop Ecosystem, HDFS, MapReduce, YARN, HBase, Spark, Kafka, Zookeeper, Hive, Pig, Sqoop, Flume

Tools: Visual Studio, GitHub, SharePoint, Automation Anywhere, QlikSense, Jupyter, Alteryx, Matlab

Management: Agile Methodologies, Waterfall Project Management, SDLC, Scrum, Jira

Analysis and Visualization: Tableau, Seaborn, Advance MS Excel, Power BI, Qlikview

ETL Tools: Talend, Pentaho, Informatica, Microsoft SQL Server SSIS

Soft skills: Problem Solving, Leadership, Delegation, Motivation, and Teamwork

PROFESSIONAL EXPERIENCE:

Confidential, Salt Lake City, Utah

Data Engineer/ Spark Developer

Responsibilities:

  • Installed, configured, monitored and maintained Hadoop cluster on Big Data platform.
  • Wrote multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Deployed, automated, maintained and managed AWS cloud-based production system (EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce
  • Deployed, automated, maintained and managed AWS cloud-based production system, to ensure the availability, performance, scalability and security of productions systems
  • Worked in AWS environment using S3, Athena, Lambda, AWS Glue, AWS Cloud Formation, QuickSight, EC2, EMR and API Gateway
  • Data Extraction, aggregations and consolidation of Adobe data within AWS Glue
  • Designed and developed ETL/ELT processes to handle data migration from multiple business units using AWS Glue
  • Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.
  • Built real time data pipelines by developing Kafka producers and Spark streaming applications for consuming
  • Worked with teams to use KSQL for real-time analytics
  • Written Hive queries on the analyzed data for aggregation and reporting.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase
  • Used Flume to collect, aggregate, and store dynamic web log data from different sources like web servers, mobile devices and pushed to HDFS.
  • Stored and fast update data in Hbase, provided key based access to specific data.
  • Extracted files from Cassandra and MongoDB through Sqoop and placed in HDFS and processed.

Environment: Hadoop, HDFS, Hive, Map Reduce, Sqoop, Spark, Flume, Yarn, Pig, Hbase, Cassandra, MongoDB, EC2, S3, EMR, QuickSight, Intellij

Confidential, Memphis, Tennessee

Hadoop / Big Data Developer

Responsibilities:

  • Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
  • Created multiple Hive tables, implemented partitioning, dynamic partitioning and buckets in Hive for efficient data access.
  • Used Pig UDFs to do data manipulation, transformations, joins and some pre-aggregations.
  • Used Flume to collect, aggregate, and store dynamic web log data from different sources like web servers, mobile devices and pushed to HDFS.
  • Configured Kafka to read and write messages from external programs
  • Wrote Kafka producers to stream the data from external rest APIs to Kafka topics
  • Stored and fast update data in Hbase, provided key based access to specific data.
  • Extracted files from Cassandra and MongoDB through Sqoop and placed in HDFS and processed.
  • Configured Spark to optimize data process.
  • Configured Zookeeper, worked on Hadoop High Availability with Zookeeper failover controller, add support for scalable, fault-tolerant data solution.
  • Worked on Oozie workflow engine for job scheduling.
  • Created HDFS Snapshots in order to do data backup, protection against user errors and disaster recovery.

Environment: Hadoop, HDFS, Hive, Map Reduce, MySQL, Oracle, Sqoop, Spark, SQL Talend, Flume, Yarn, Pig, Oozie, Linux-Ubuntu, Java, agile methodologies

Confidential, Irving, Texas

Data Engineer

Responsibilities:

  • Gathered and analyzed business requirements, interacted with various business users, project leaders, developers and took part in identifying different data sources Compiled data from multiple data sources, used SQL, Python and SAS packages for data extraction, loading and transformation
  • Data manipulation and data engineering experience involving structured and unstructured data Analyze large amounts of information to discover trends and patterns
  • Conduct data analysis using SQL & Teradata to assist in the development of the solution
  • Collaborate with engineering and product development teams
  • Participated in implementation of Hadoop platform and Big data technologies: Spark, Hive and HBase
  • Performed Data Cleaning, features scaling, features engineering, feature prioritization using pandas and NumPy packages in Python.
  • Created and Modified database Triggers, stored procedures, or complex analytical queries including multi-table joins, nested queries and correlated sub queries optimized the performance
  • Performed exploratory data analysis (EDA), summarized descriptive statistics
  • Handled anomalies in the data - removing duplicates, imputing missing values and treating null values
  • Visualized the data with the help of box plots and scatter plots to understand the distribution of data using Tableau
  • Collaborate with data scientist to prototype predictive models for converting data to insights
  • Translating high-level project requirements into technical tasks
  • Involved in creating charts and graphs of the data from different data sources by using Matplotlib and SciPy libraries in Python

Environment: Python, SQL, Anaconda, Jupyter Notebooks, Tableau Desktop, Jira, Git, Microsoft Excel

Confidential

Data Analyst

Responsibilities:

  • Developed test cases and SQL test scripts based on detail data design, detail functional design, and ETL specifications.
  • Defined and documented detailed ETL specifications for Data Warehouse interface, extracts, staging areas, atomic areas, and mart database environments.
  • Perform data manipulation operations like import/export data from various external file formats using SSIS
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of database
  • Experienced in GUI, Relational Database Management System (RDBMS), designing of OLAP system environment as well as Report Development
  • Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using Tableau, and Power BI
  • Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files with high volume of data
  • Developed the code as per client's requirements using SQL, PL/SQL and Data Warehousing concepts
  • Implementation of Metadata Repository, Maintaining Data Quality, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans.
  • Used advanced features of T-SQL in order to design and tune T-SQL to interface with the Database and other applications in the most efficient manner and created stored Procedures for the business logic using T-SQL
  • Developed and modified existing OLAP cubes for better performance using ETL tool
  • Coordinated JIRA stories for various projects ensuring that all projects meet the deadlines

Environment: MySQL, Python, Jira, Tera Data, Talend, Informatica, Oracle, T-SQL, Tableau, Power BI

We'd love your feedback!