We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

Columbus, OH

SUMMARY

  • Data Engineering professional with solid foundational skills and proven tracks of implementation in a variety of data platforms. Self - motivated with a strong adherence to personal accountability in both individual and team scenarios.
  • Over 8+ years of experience in Designing, Developing, and integrating applications using Hadoop, Hive, PIG, in all three Bigdata platforms Cloudera, Hortonworks, MapR, Snowflake, Apache Airflow in which 5 years of experience on Data Engineering and 4years of experience on Data Warehouse.
  • Experience writing pig and hive scripts.
  • Experience in writing Map Reduce programs using Apache Hadoop for analyzing Big Data.
  • Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
  • In depth knowledge of Hadoop Architecture and Hadoop daemons such as Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker.
  • Good working knowledge on Snowflake and Teradata databases.
  • Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle/Snowflake.
  • Expertized in Python data extraction and data manipulation, and widely used python libraries like NumPy, Pandas, and Matplotlib for data analysis.
  • Extensively worked on other machine learning libraries such as Seaborn, Scikit learn for machine learning and familiar working with TensorFlow, NLTK for deep learning
  • Experienced in creating SnowFlake Multi-cluster Size and Credit Usage.
  • Played key role in Migrating Teradata objects into SnowFlake environment.
  • Experience with Snowflake Multi-Cluster Warehouses
  • Experience with Snowflake Virtual Warehouses.
  • Having In-depth knowledge of Data Sharing in Snowflake.
  • Have a knowledge of Snowflake Database, Schema and Table structures.
  • Experience in using Snowflake Clone and Time Travel.
  • Solid understanding and experience with extract, transform, load
  • Implementing data movement from File system to Azure Blob storage using python API
  • Implemented PoC for running ML models on Azure ML studio
  • Written Kafka consumer Topic to move data from adobe clickstream Json object to Datalake. Experience on working with file structures such as text, sequence, parquet and Avro file formats.
  • Implemented Time Series Map Reduce paradigm using Java and Spark.
  • Expertise in using Sqoop & Spark to load data from MySQL/Oracle to HDFS or HBase.
  • Well versed in using ETL methodology for supporting corporate-wide- solution using Informatica 7.x/8.x/9.x
  • Implemented Datawarehouse solutions using Snowflake Product.
  • Proficient in the Integration of various data sources with multiple relational databases like Oracle11g /Oracle10g/9i, Sybase12.5, Teradata and Flat Files into the staging area, Data Warehouse and Data Mart.
  • Expertise in Tuning & Optimizing the DB relevant issues (SQL Tuning).
  • Involved in all phases of ETL life cycle from scope analysis, design, and build through production support.

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Columbus, OH

Responsibilities:

  • Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Hive, Pig, HBase, SQOOP, Flume, Spark, AVRO, Zookeeper etc.),
  • Handling raw data from various subsystems and Load the data from different subsystems into to HDFS for further processing.
  • Developed Apache Pig scripts and UDF's extensively for data transformations and calculating Statement date formats and aggregates the monitory transactions.
  • Integrated Apache Kafka for data ingestion.
  • Created Hive tables to store the processed results in a tabular format.
  • Developed the SQOOP scripts in order to make the interaction between Pig and MySQL Database.
  • Involved in Migrating Objects from Teradata to Snowflake.
  • Created Snowpipe for continuous data load.
  • Developed data warehouse model in snowflake for over 100 datasets using whereScape.
  • Heavily involved in testing Snowflake to understand best possible way to use the cloud resources.
  • Implemented various Machine learning algorithms to analyze market trends and its effects on the company.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive querying.
  • Implemented performance-tuning techniques along various stages of the Migration process.
  • Developed Hive Scripts for implementing dynamic Partitions.
  • Experienced in handling different optimization join operations like Map join, Sorted Bucketed Map join etc.
  • Implemented Partitioning, Dynamic Partitions and Buckets in Hive for analytical processing by business users.
  • Expertise in performance tuning on Hive Queries, joins and different configuration parameters to improve query response time.
  • Completely involved in the requirement analysis phase.
  • Implemented POC writing programs in Scala and data processed using Spark-SQL.
  • Conducted POC for Hadoop and Spark as part of Next-Gen platform implementation.
  • Implemented recommendation engine using Scala.
  • Setting up cron job to delete Hadoop logs/local old job files/cluster temp files.
  • Monitor System health and logs and respond accordingly to any warning or failure conditions.
  • Tested raw data and executed performance scripts.
  • Setup Hive with MySQL as a Remote metastore.
  • Exported analyzed data to relational databases using Sqoop for visualization to generate reports for the BI team.
  • Moved all log/text files generated by various products into HDFS location.
  • Implemented POC for using APACHE IMPALA for data processing on top of HIVE.
  • Streamlined Hadoop jobs and workflow operations using Oozie workflow.
  • Experienced in writing unit test cases, implement unit test cases using JUnit.
  • Worked with testing teams and resolved defects.

Environment: Hadoop (CDH5), UNIX, Scala, Python, Storm, Databricks, Azure Spark, Spark-SQL, Map Reduce, Apache Pig, Hive, Impala, SQOOP, Java, Eclipse, Kafka, MySQL and Oozie.

Data Engineer

Confidential, Bothell, WA

Responsibilities:

  • Developed Spark applications using Scala.
  • Performance analysis of batch jobs by using Spark Tuning parameters.
  • Enhanced and optimized Spark/Scala/ pyspark jobs to aggregate, group and run data mining tasks using the Spark framework.
  • Worked on aws tools like Kinesis, DynamoDB, S3.
  • Importing and exporting data into HDFS and hive using Sqoop and Kafka with batch and streaming.
  • Worked on MySQL RDBMs db. as backend database to store monitoring information about CCPA project.
  • Used Service now platform to open and close tickets of CCPA project.
  • Involved in complete Big Data flow of the application data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data using several tools.
  • Imported the data from various formats like JSON, ORC and Parquet to HDFS cluster with compressed for optimization.
  • Experienced on ingesting data from RDBMS sources like - Oracle, SQL Server and Teradata into HDFS using Sqoop.
  • Deployed Pyspark applications and developed in Databricks cluster.
  • Experience in managing and reviewing huge Hadoop log files.
  • Importing and exporting data into HDFS and hive using Sqoop and Kafka with batch and streaming.
  • Experienced with Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into HBase.
  • Performance analysis of Spark streaming and batch jobs by using Spark tuning parameters.
  • Enhanced and optimized Spark/python jobs to aggregate, group and run data mining tasks using the Spark framework.
  • Installed, configured and developed various pipeline activities with Nifi using various processors such as Sqoop processor, Kafka processor, HDFS Processor, File Processors etc.
  • Created Data Pipelines as per the business requirements and scheduled it using Oozie.
  • Used Hive to join multiple tables of a source system and load them to Elastic search tables.
  • Involved in complete Big Data flow of the application data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data using several tools.
  • Imported the data from various formats like JSON, Sequential, Text, CSV, AVRO and Parquet to HDFS cluster with compressed for optimization.
  • Experienced on ingesting data from RDBMS sources like - Oracle, SQL Server and Teradata into HDFS using Sqoop.
  • Configured Hive and written Hive UDF's and UDAF's Also, created partitions such as Static and Dynamic with bucketing.
  • Experience in managing and reviewing huge Hadoop log files.
  • Expertise in designing and creating various analytical reports and Automated Dashboards to help users to identify critical KPIs and facilitate strategic planning in the organization.
  • Plus 4 months experience indexing data into Elastic search and built dashboards with Kibana and used Kibana as search tool on elastic search.
  • Experience in CI/CD tool Jenkins for code deployment and scheduling of jobs.
  • Worked with other ML teams.
  • Expertise in creating metrics and processing data using query exporter and Prometheus dashboard.
  • Good knowledge and interest in data science and Machine learning concepts, spent time understanding code and project details of machine learning teams in my Wal-Mart branch.

Environment: Hive, Prometheus, pyspark, Jenkins, Airflow, Gerrit, Kafka, Spark, Sqoop, Maven, Automic, SQL, Scala, Junit, Intellij, MySQL, Databricks, Aws cloud.

Data Engineer

Confidential, Eagan, MN

Responsibilities:

  • Responsible for requirements gathering and analyzing the data sources.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked very closely with the Hadoop Administrator to set up the Hadoop cluster.
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive HBase database and SQOOP.
  • Importing and exporting data into HDFS and Hive using SQOOP.
  • Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Generate OBIEE reports to verify the Hive tables data.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Experience working in spark eco system using Spark-Sql and Scala queries on different formats like Text file, CSV file.
  • Involved in loading data from LINUX file system to HDFS.
  • Load and Transform large sets of structured and semi structured data.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Created Hive tables to store variable data formats of PII data coming from different portfolios.
  • Implemented a script to transmit sysprin information from Oracle to Hive using Sqoop.
  • Designed and developed MapReduce jobs to analyze the data.
  • Implemented best income logic using Pig scripts and UDFs.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Responsible to manage data coming from different sources.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data.
  • Cluster coordination services through Zookeeper.
  • Developed Spark scripts by using Scala Shell commands as per the requirement.
  • Experience in managing and reviewing Hadoop log files.
  • Responsible for cluster maintenance, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Experience using Amazon Web Services.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Involved in email campaign launch and served as point of contact for anything related to data, pulling customers list, matching and finished training in Strong View email platform software.

Environment: Mapr, Hadoop MapReduce, HDFS, Spark, Hive, Pig, SQL, Sqoop, Flume, Oozie, Java 8, Eclipse HBase, Shell Scripting, Scala.

Data Engineer

Confidential

Responsibilities:

  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Data sources are extracted, transformed and loaded to generate CSV data files with Python programming and SQL queries.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems aswell as RDBMS and NoSQL data stores for data access and analysis.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in the project.
  • Worked on Cluster of size 130 nodes.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark.
  • Responsible for developing data pipelines with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability

Environment: Mapr, Hadoop MapReduce, HDFS, Spark, Hive, Pig, SQL, Sqoop, Flume, Oozie, Java 8, Eclipse HBase, Shell Scripting, Scala.

Data Analyst/Data Modeler

Confidential 

Responsibilities:

  • Effectively involved in Data Analyst/Data Modeler role to review business requirement and compose source to target data mapping documents.
  • Used Sybase Power Designer tool for relational database and dimensional data warehouse designs.
  • Extensively used Star Schema methodologies in building and designing the logical data model and Physical Data Model into Dimensional Models
  • Designed and developed databases for OLTP and OLAP Applications.
  • Used SQL Profiler and Query Analyzer to optimize DTS package queries and stored procedures.
  • Created SSIS packages to populate data from various data sources.
  • Extensively performed Data analysis using Python Pandas.
  • Extensively involved in the modeling and development of Reporting Data Warehousing System.
  • Developed reports and visualizations using Tableau Desktop as per the requirements.
  • Extensively used Informatica tools- Source Analyzer, Warehouse Designer, Mapping Designer.
  • Involved in migration and Conversion of Reports from SSRS.
  • Migrated data from SQL Server to Oracle and uploaded as XML-Type in Oracle Tables.
  • Performed Data Dictionary mapping, extensive database modeling (Relational and Star schema) utilizing Sybase Power Designer.
  • Involved in performance tuning and monitoring of both T-SQL and PL/SQL blocks.
  • Worked on creating DDL, DML scripts for the data models.
  • Created volatile and global temporary tables to load large volumes of data into Teradata database
  • Involve in designing Business Objects universes and creating reports
  • Provided production support to resolve user issues for applications using Excel VBA
  • Worked with Tableau in analysis and creation of dashboard and user stories.
  • Involved in extensive Data validation using SQL queries and back-end testing
  • Performed GAP analysis of current state to desired state and document requirements to control the gaps identified.

Environment: Sybase Power Designer, SQL, PL/SQL, Teradata14, Oracle11g, XML, Tableau, OLAP, OLTP, SSIS, Informatica, SSRS, Python, T-SQL, Excel.

Hire Now