We provide IT Staff Augmentation Services!

Sr. Hadoop Infrastructure Engineer / Big Data Developer Resume

5.00/5 (Submit Your Rating)

Deerfield, IL

SUMMARY:

  • A Results - oriented Software Development professional with a bottom-line focus and a proven track records over 8 years Specialized in design and development of Big Data Technologies in highly scalable end-to-end Hadoop Infrastructure to solve business problems which involves building large scale data pipelines, data lakes, data warehouse, real-time analytics and reporting solutions.
  • Experience and deep understanding of overall Hadoop ecosystem tools like HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Yarn, Hue and Spark
  • Experience in providing an architecture for new proposal using different cloud technologies, Hadoop ecosystem tools, reporting and modeling tools.
  • Leverage AWS, Informatica Cloud, Snowflake Data Warehouse, Hashi corp Platform, AutoSys, and Rally Agile/SRUM to implement Data Lake, Enterprise Data Warehouse, and advanced data analytics solutions based on data collection and integration from multiple sources (Salesforce, Salesconnect, S3, SQL Server, Oracle, NoSQL and Mainframe systems).
  • Involved in the architectural design and development of highly scalable and optimized data models, Data Marts, Snowflake Data Warehouse, Data Lineage, Metadata repository and using Jenkins, Vagrant, Vault, GitHub/Git-Bash Enterprise and Terraform as IaS to provision cloud infrastructure and security.
  • Implement AWS Data Lake leveraging S3, terraform, vagrant/vault, EC2, Lambda, VPC, and IAM in performing data processing and storage while writing complex SQL queries, analytical and aggregate functions on views in Snowflake data warehouse to develop near real time visualization using Tableau Desktop/Server 10.4 and Alteryx.
  • Perform data masking and ETL process using S3, Informatica cloud, Informatica Power Center and Informatica Test Data Management to support Snowflake Data warehousing solution in the cloud.
  • Experience and deep understanding of Solr, banana and D3.JS tools to create dashboards and search criteria for indexed data.
  • Experience and deep understanding of Modeling and Analytics on data using python programming
  • Experience on creating google analytics reports on market trends and measurements
  • Experience in designing and developing data ingestion using Spark with Scala/Java, apache NiFi, Apache Camel, Spark Streaming, Kafka, Flume, Sqoop and Shell Script
  • Experience in to build Hadoop cluster and development using AWS, Amazon EC2, and S3 cloud technology
  • Experience in to build Hadoop applications using Lambda architecture
  • Experience in designing and developing solutions using Hadoop, AngularJS, .Net, and QlikView
  • Experience in design end to end setup of Hadoop cluster using Cloudera, Hortonworks distribution and plain apache Hadoop
  • Good knowledge of Apache Flink
  • Experience in working with RDBMS (Microsoft SQL Server, Oracle, and MySql), Vertica, Teradata and NoSQL (HBase)
  • Experience in an Agile framework with Jira and TFS as scrum master

TECHNICAL SKILLS:

Languages: Go, Java, Scala, Python, JavaScript (node.js, AngularJS), HTML, XML, JSON, SQL, PL/SQL, Objective-C.

IaaS: AWS EC2, Google Cloud Platform

Containers: Docker

Distributed databases: Casandra, HBase, MongoDB

Distributed query engine: AWS Athena, Hive, Presto

Distributed file systems: HDFS, S3.

Distributed computing environment: Amazon EMR, Hortonworks

Security: Ranger, Knox, Atlas, HDFS Encryption

Operations: Ambari, ZooKeeper

Scheduling: Oozie

Data Governance: Atlas, Falcon

Data Flow: NiFi, MiNiFi,Sqoop, Flume, Kafka, WebHDFS, Amazon Kinesis, Firehose.

Distributed data processing: Hadoop, Spark, Storm

Decentralized technology:: Blockchain, Ethereum, Hyperledger

Search & indexing: Solr / Lucene, Elasticsearch

Relational databases: Oracle, MySQL, IBM DB2, MS SQL Server

Operating System: s: Windows, *NIX (Linux, AIX, Solaris)

Source Control: Git, Subversion

BPM: IBM BPM, Vitria.

Application servers: BEA WebLogic, WebSphere, Jboss

PROFESSIONAL EXPERIENCE:

Confidential, Deerfield IL

Sr. Hadoop Infrastructure Engineer / Big Data Developer

Responsibilities:

  • Designing the applications from the ingestion to reports delivery to third party vendors using big data technologies flume, Kafka, Sqoop, map-reduce, hive, pig.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it using Map Reduce programs.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from HDFS to MYSQL using Sqoop.
  • Developed a data pipeline using Kafka and Strom to store data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Experience in AWS cloud environment and on s3 storage.
  • AWS Redshift performance tuning and optimization by using correct distkey and sortkey.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Developed HDFS with huge amounts of data using Apache Kafka.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developing UDFs in java for hive and pig, worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Process the raw log files from the set top boxes using java map reduce code and shell scripts and stored them as text files in HDFS.
  • Used storage format like AVRO to access multiple columnar data quickly in complex queries.
  • Ingesting the data from legacy and upstream systems to HDFS using apache Sqoop, Flume java map reduce programs, Hive queries and pig scripts.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Generating the required reports using Oozie workflow and Hive queries for operations team from the ingested data.

Confidential, Merrimack NH

Sr. Big Data / Hadoop Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Wrote multiple MapReduce programs using PigLatin and in Java for Data Analysis.
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files. Developing pig scripts for analyzing large data sets in the HDFS.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
  • Performed extensive Data Mining applications using Hive. Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Created Sqoop jobs, Pig and Hive scripts for data ingestion from relational databases to compare with historical data.
  • Used Kafka to load data in to HDFS and move data into NoSQL databases viz. Cassandra.
  • Used Oozie Operational Services for batch processing and scheduling workflows dynamically.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Used Visualization tools such as Power View for excel, Tableau for visualizing and generating reports.
  • Exported data to Tableau and excel with Power View for presentation and refining.
  • Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources. Implemented Hive Generic UDF's to implement business logic.

Confidential, Austin TX

Hadoop Developer / Cloud infrastructure engineer

Responsibilities:

  • Architected data storage and processing capabilities in the cloud for reduced costs and ease of maintenance and scalability. Designed and implemented ETL to RedShift on Amazon, as well as employing other data lakes and tools.
  • Optimized Amazon Redshift clusters, Apache Hadoop clusters, data distribution, and data processing.
  • Developed MapReduce programs to process the Avro files and to get the results by performing some calculations on data and also performed map side joins.
  • Imported Bulk Data into HBase Using MapReduce programs.
  • Programmed ETL functions between Oracle and Amazon Redshift.
  • Designed and implemented Incremental Imports into Hive tables.
  • Involved in creating Hive tables, loading with data and writing Hive queries that will run internally.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Flume.
  • Imported and Exported Data from Different Relational Data Sources like DB2, SQL Server, Teradata to HDFS using Sqoop.
  • Migrated complex map reduce programs into in memory Spark processing using Transformations and actions.
  • Experienced in collecting the real-time data from Kafka using Spark Streaming and perform transformations and aggregation on the fly to build the common learner data model and persists the data into Hbase.
  • Worked on creating the RDD's, DF's for the required input data and performed the data transformations using Spark Python.
  • Involved in developing Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.

Confidential, Natick, MA

Hadoop Systems Engineer / Big Data

Responsibilities:

  • Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Developed simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (PigLatin) to study customer behavior.
  • Tested Apache Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Used Impala to read, write and query the Hadoop data in HDFS or HBase or Cassandra.
  • Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Experience in writing MapReduce jobs in Java, Pig, Hive and MapReduce, Tuning MR/Hive queries.
  • Expertise in doing Tableau Server Management (like Clustering, Load Balancing, User Management etc.).
  • Expertise in taking back ups and restoration of Tableau repository.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Used Mahout to understand the machine learning algorithms for an efficient data processing.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.

Confidential, Greenville, SC

Big Data / Hadoop Engineer

Responsibilities:

  • Develop Hive, Pig scripts for data transformation.
  • Exposure about ETL batch and the concept of datawarehousing.
  • Develop Hadoop jobs through schedulers and use SSIS orchestration engine and also through oozie
  • Develop Python, shell, JAVA and HQL scripts for data flow orchestration;
  • Manage software build when needed thru Microsoft TFS and GIT
  • Support REST-Based ETL Hadoop software in higher environments like UAT, Production;
  • Built the SSIS packages that orchestrates the Green Plum Jobs and Troubleshoot SSIS Packages if needed
  • Worked with SQL Server meta data system; and
  • Ability to troubleshoot asp.net web API based REST layer.
  • Architect, Design and develop Hadoop ETL by using Kafka.
  • To create SPARK, PIG and HIVE Jobs by using Python Rest Orchestration
  • To build MR API's programs where we used in the combination of HIVE and HBASE
  • Ability to work on Green Plum (Postgres DB) for the transformed data to store.
  • To create MONGO collection to store in MONGO DB for the persistent storage.
  • Developing multiple Java based Kafka Producers and Consumers from scratch as per the business requirements.
  • Worked on XML formatted data, Text formatted Data,JSON formatted data.
  • Used AVRO’s as a schema to Hive tables.

Confidential

ETL / Data Analytics Engineer

Responsibilities:

  • Involved in data solutions in support of advanced data analytics enabling the transition from log management to broader analytics.
  • Extensively worked on ETL Load (SSIS Package), report creation and data visualization including dashboards and reports leveraging charts (Pivot tables), gauges and drill-through.
  • Created SQL Agent Jobs to schedule jobs for extracting data from different source systems using SSIS packages. Stored Procedures are used within these packages to aggregate the data.
  • Created SQL Server Traces in SQL Server Profiler, to collects a variety of information about SQL Server connections, stored procedures and Transact-SQL statements.
  • Created traces using SQL server profiler to find long running queries and modify those queries as a part of Performance Tuning operations.
  • Create on demand Ad hoc reports, Parameterized reports, Linked reports, Snapshot reports, Drilldown reports and Sub reports using SSRS.
  • Created numerous SSIS packages (ETL) to migrate data from different server locations and heterogeneous sources like Excel, CSV, flat file, XML and Text Format Data.
  • Implementing custom analytics and reporting tools to assist clients in market analysis (SAS, Tableau).
  • Performance tuning of SQL queries and stored procedures using SQL Profiler and Index Tuning Advisor.
  • Transformed data from various data sources using OLE DB connection by creating various SSIS packages.
  • Migrated data from different source locations to Dev / QA environment using SQL 2008 SSIS packages.
  • Created SQL Server Traces in SQL Server Profiler, to collects a variety of information about SQL Server connections, stored procedures and Transact-SQL statements.
  • Created complex stored procedures, triggers, cursors, views, user-defined functions, SQL Server Traces in SQL Server Profiler to collect a variety of information about SQL Server transactions, server performances and Transact-SQL (T-SQL) statements.
  • Executed test scripts, documented defects and verified fixes using bug tracking system BI designer / development / production team.
  • Manually performed transcript data verification obtained from the client's database to validate the integrity of the application.

Confidential

SQL Data Analyst

Responsibilities:

  • Primarily worked on a ETL product to handle complex and large volume healthcare claims data. Designed ETL framework and developed number of packages to Extract, Transform and Load data using SQL Server Integration Services (SSIS) into local MS SQL 2008 databases to facilitate reporting operations.
  • Performing Data source investigation, developed source to destination mappings and data cleansing while loading the data into staging/ODS regions
  • Involved in various Transformation and data cleansing activities using various Control flow and data flow tasks in SSIS packages during data migration.
  • Performance monitoring and Optimizing Indexes tasks by using Performance Monitor, SQL Profiler, Database Tuning Advisor and Index tuning wizard

We'd love your feedback!