We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

ArkansaS

SUMMARY

  • 8+ years of professional experience in information technology with an expert hand in the areas of BIG DATA, HADOOP, SPARK, HIVE, SQOOP, SQL tuning, ETL development, report development, database development, data modeling on various IT Projects
  • Well knowledge and experience in Cloudera ecosystem (HDFS, YARN, Hive, SQOOP, FLUME, HBASE, Oozie, Kafka, Pig), Data pipeline, data analysis and processing with hive SQL, SPARK, SPARK SQL. Hands of experience in GCP, Big Query, GCS bucket, cloud dataflow, GSUTIL, BQ command line utilities.
  • Experience in Amazon AWS services such as EMR, EC2, S3, cloud Formation, Red shift which provides fast and efficient processing of Big Data.
  • Developed Spark applications using Scala for easy Hadoop transitions. Used Spark API over ClouderaHadoop YARN to perform analytics on data in Hive. Developed Spark code and Spark - SQL/Streaming for faster testing and processing of data.
  • Data Ingestion in to Hadoop (HDFS): Ingested data into Hadoop from various data sources like Oracle, MySQL using Sqoop tool. Created Sqoop job with incremental load to populate Hive External tables. Involved in importing the real time data to Hadoop using Kafka and also worked on Flume. Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports for the BI team.
  • File Formats: Involved in running Hadoop streaming jobs to process terabytes of text data and worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
  • Strong Knowledge and experience on architecture and components of Spark, and efficient in working with Spark Core, SparkSQL, Spark streaming and implemented Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used PySpark and spark-shell accordingly.
  • Experience working in Teradata, Oracle, and MySQL database.
  • Experience in working with Onsite-Offsite-Offshore model.

TECHNICAL SKILLS

Big Data: Ecosystem Hadoop, MapReduce, Pig, Hive, YARN, Flume, Sqoop, Impala, Spark,Parquet, Snappy, ORC, Ambari and TEZ

Hadoop: Distributions Cloudera, Hortonworks, MapR

Languages: Java, SQL, Scala, Pyspark and C/C++

Tools: Eclipse, Maven and IntelliJ Scheduling and Automation

Shell scripts,: Oozie and Automic workflows to automate and for scheduling Automic Scheduler

DB Languages: MySQL and PL/SQL

RDBMS: Oracle, MySQL and DB2

Operating systems: UNIX, LINUX and Windows Variants

PROFESSIONAL EXPERIENCE

Confidential, Arkansas

Data Engineer

Responsibilities:

  • Experience in building multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformationin GCP and coordinate task among the team.
  • Loading SAP Transactional Data every 30 min on incremental basis to BIGQUERY raw and UDM layer using SQL, Google DataProc, GCS bucket, HIVE, Spark, Scala, Python, GSutil And Shell Script.
  • Building a Scala and spark based configurable framework to connect common Data sources like MYSQL, Oracle, SQLServer, SAP HANA and load it in Bigquery.
  • Monitoring Bigquery and cloud Data flow jobs via Automic scheduler for all the environment.
  • Open SSH tunnel to Google DataProc to access to yarn manager to monitor spark jobs.
  • Submit spark jobs using gsutil and spark submission get it executed in GCP cluster.
  • Understand the business needs and objectives of the system and gathering requirements from the reporting perspective.
  • Develop HQL queries to perform DDL and DML's
  • Develop transformations scripts using Hive QL to implement logic on data lake.
  • Develop automic workflows between Hadoop, Hive and Teradata using Aorta framework
  • Schedule jobs on Automic to automate aorta framework Workflows.
  • Design and develop data pipeline using the YAML scripts and Aorta Framework.
  • Ingest data from Teradata to Hadoop using Aorta Framework
  • Create, Validate and maintain Aorta scripts to load data from various data sources to HDFS.
  • Creating external tables to read data into Hive from RDBMS.
  • Develop shell scripts to build a framework between Hive and Teradata.
  • Unit testing the Aorta jobs and Automic workflows.
  • Developed Data Quality (dq) scripts to validate and maintain data quality for downstream applications.
  • Validated the Thoughtspot data against base tables in data lake.
  • Used Git for version control.

Environment: GCP,Hadoop, MapReduce, Sqoop, Hive, Automic, Spark, Teradata.

Confidential, New Hampshire

Hadoop Developer

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experienced in loading and transforming of large sets of structured, semi structured, and unstructured data.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Worked on spark Structures streaming for processing millions of files In S3 buckets and on persisting them to S3 buckets in partitions
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Implement synchronization between Hive and Athena for reporting.
  • Worked on spark Structures streaming for processing millions of files In S3 buckets and on persisting them to S3 buckets in partitions.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Use of Sqoop to import and export data from HDFS to RDBMS and vice - versa.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Worked on NoSQL databases including HBase and Cassandra.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Used GIT for version control.

Confidential

SQL Server Developer

Responsibilities:

  • Translated business needs into data analysis, business intelligence data sources and reporting solutions for different types of information consumers.
  • Involved in Planning, Defining and Designing data base using Erwin on business requirement and provided documentation.
  • Developed complex T - SQL code such as Stored Procedures, functions, triggers, Indexes and views for the application.
  • Used Joins, correlated and non-correlated sub-queries for complex business queries involving multiple tables & calculations from different databases.
  • Created many complex Stored Procedures/Functions and used them in Reports directly to generate reports on the fly.
  • Created SSIS package to load data from XML Files and SQL Server 2005 to SQL Server 2008 by using Lookup, Fuzzy Lookup, Derived Columns, Condition Split, Term Extraction, Aggregate, Pivot Transformation, and Slowly Changing Dimension.
  • Converted Crystal Reports to SQL Server Reports.
  • Used SSIS to create ETL packages to Validate, Extract, Transform and Load data to Data Warehouse and Data Mart Databases.
  • Developed Unit Test Plans for SSIS packages for checking the functionality of each component.
  • Modified the existing SSIS packages to meet the changes specified by the users.
  • Scheduled Jobs for executing the stored SSIS packages which were developed to update the database on Daily basis using SQL Server Agent.

Confidential

Responsibilities:

  • Gathered user requirements to develop the business logic and analyzed and designed database solutions.
  • Generated Performance reports on SSIS packages, Stored Procedures and Triggers.
  • Created OLAP cubes using SSAS to analyze the business condition of the sales.
  • Wrote complex queries for the drill down reports and used conditional formatting using SSRS.
  • Used SSIS to transfer data from various sources like SQL Server, MS Access to be used for generating reports using SSRS 2008.
  • Formatted reports for various Output formats as requested by the management using SSRS.
  • Actively supported business users for changes in reports as and when required using SSRS.
  • Analyzed reports and fixed bugs in stored procedures using SSRS.
  • Created linked reports and managed snapshots using SSRS.
  • Performed various calculations using complex expressions in the reports and created report models.
  • Designed and optimized indexes, views, stored procedures and functions using T - SQL.
  • Performed maintenance duties like performance tuning and optimization of queries, functions and stored procedures.
  • Designed high level SSIS architecture for overall data transfer from the source server to the Enterprise Services Warehouse.

We'd love your feedback!