We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Plano, TX

SUMMARY

  • A Results - oriented Software Development professional with a bottom-line focus and a proven track records over 6 years Specialized in design and development of Big Data Technologies in highly scalable end-to-end Hadoop Infrastructure to solve business problems which involves building large scale data pipelines, data lakes, data warehouse, real-time analytics and reporting solutions.
  • Experience and deep understanding of overall Hadoop ecosystem tools like HDFS, Map Reduce, Hive, Sqoop, Oozie, Yarn and Spark
  • Experience in providing an architecture for new proposal using different cloud technologies, Hadoop ecosystem tools, reporting and modeling tools.
  • Leverage AWS Redshift, Snowflake Data Warehouse to implement Data Lake, Enterprise Data Warehouse, and advanced data analytics solutions based on data collection and integration from multiple sources (Salesforce, Salesconnect, S3, SQL Server, Oracle, NoSQL).
  • Involved in the architectural design and development of highly scalable and optimized data models, Data Marts, Snowflake Data Warehouse, Data Lineage, Metadata repository and using Jenkins, Vagrant, Vault, GitHub/Git-Bash Enterprise and Terraform as IaS to provision cloud infrastructure and security.
  • Implement AWS Data Lake leveraging S3, terraform, EC2, Lambda, VPC, and IAM in performing data processing and storage while writing complex SQL queries, analytical and aggregate functions on views in Snowflake data warehouse to develop near real time visualization using Tableau Desktop/Server 10.4.
  • Perform data masking and ETL process using S3, Informatica cloud, Informatica Power Center and Informatica Test Data Management to support Snowflake Data warehousing solution in the cloud.
  • Experience and deep understanding of Solr, banana and D3.JS tools to create dashboards and search criteria for indexed data.
  • Experience and deep understanding of Modeling and Analytics on data using python programming
  • Experience on creating google analytics reports on market trends and measurements
  • Experience in designing and developing data ingestion using Spark with Scala/Java, apache NiFi, Spark Streaming, Kafka, Flume, Sqoop and Shell Script
  • Experience in designing and developing solutions using Hadoop,AngularJS,.Net, and Tableau
  • Good knowledge of RESTful Web Services
  • Knowledge on AWS services like EC2, Dynamo DB, S3, Kinesis, Redshift, Lambda, RDS
  • Experience in working with RDBMS (Microsoft SQL Server, Oracle, and MySql), Vertica, Teradata and NoSQL(HBase)
  • Experience in an Agile framework with Jira and TFS as scrum master

TECHNICAL SKILLS

Languages: Scala, Python, HTML, JSON, SQL, R.

IaaS: AWS, Google Cloud Platform

Containers: Docker

Distributed databases: Casandra, HBase, MongoDB

Distributed query engine: AWS Athena, Hive, Presto

Distributed file systems: HDFS, S3.

Distributed computing environment: Amazon EMR, Hortonworks

Operations: Ambari, ZooKeeper

Scheduling: Oozie

Data Flow: NiFi, Kafka, Amazon Kinesis, Firehose.

Distributed data processing: Hadoop, Spark

Search & indexing: Solr / Lucene, Elasticsearch

Relational databases: Oracle, MySQL, IBM DB2, MS SQL Server

Source Control: Git, Subversion

PROFESSIONAL EXPERIENCE

Confidential, Plano TX

Sr. Big Data Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop, Spark with Scala.
  • Implemented Spark using Scala and utilizing Dataframes and Spark SQL API for faster processing ofdata.
  • Handled large datasets using Partitions, Broadcasts inSpark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Worked with Spark on Amazon EMR to process data directly in S3.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames, Scala.
  • Implemented pre-defined operators insparksuch as map, flat Map, filter, Reduce By Key, Group By Key, Aggregate By Key and Combine By Key etc.
  • Experience in developing Spark Programs for Batch and Real-Time Processing. Developed Spark Streaming applications for Real Time Processing.
  • Worked with different file formats (orcfile, textfile) and different Compression Codecs (gzip, snappy, lzo)
  • Developed complex ETL transformation & performance tuning.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Working extensively on Hive, SQL, Scala,Spark, and Shell.
  • Handlingdatain various file formats such as Sequential, AVRO, RC, Parquet and ORC.
  • Experienced in writingSparkRDD transformations, actions for the input data andSpark-SQL queries, Data frames to import data from Data sources to perform data transformations, read/write operations using Spark-Core and save the results to output directory into HDFS.
  • Responsible for design and development of Spark Applications using Scala to interact with hive and MySQL databases.

Confidential, Pleasanton CA

Hadoop Developer / Big Data Developer

Responsibilities:

  • Developed Spark applications using Scala utilizing Data frames andspark SQL API for faster processing of data.
  • Responsible for building scalable distributed data solutions using Hadoop and Spark.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • DevelopedSparkjobs usingScalaon top ofYarnfor interactive and Batch Analysis.
  • Handled importing of data from various data sources, performed transformations using spark, loaded data into Hive.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs andScala.
  • Used storage format like AVRO to access multiple columnar data quickly in complex queries.
  • UsedSpark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Developed a data pipeline using Kafka to store data into HDFS.
  • Imported data from our relational data stores to Hadoop using Sqoop.
  • Created various Map Reduce jobs for performing ETL transformations on the transactional and application specific data sources.
  • Wrote PIG scripts and executed by using Grunt shell.
  • Worked on the conversion of existing Map Reduce batch applications for better performance.
  • Developed multiple Map Reduce jobs to perform data cleaning and pre-processing.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Generating the required reports using Oozie workflow and Hive queries for operations team from the ingested data.

Confidential, Deerfield, IL

Big Data / Hadoop Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in collecting, aggregating and moving log data from servers to HDFS using Flume.
  • Imported and Exported Data from Different Relational Data Sources like DB2, SQL Server, Teradata to HDFS using Sqoop.
  • Ingesting the data from legacy and upstream systems to HDFS using apache Sqoop, Flume, Hive queries.
  • Involved in creating Hive tables, loading with data and writing Hive queries that will run internally.
  • Designed and implemented Incremental Imports into Hive tables.
  • Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
  • Used Oozie Operational Services for batch processing and scheduling workflows dynamically.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables..
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Used Visualization tools such as Power View for excel, Tableau for visualizing and generating reports.
  • Exported data to Tableau and excel with Power View for presentation and refining.

Confidential

Data Analyst

Responsibilities:

  • Performed in depth analysis ofdataand prepare daily reports by using SQL, MS Excel, MS PowerPoint and share point.
  • Created complex SQL queries and scripts to extract and aggregatedatato validate the accuracy of thedata.
  • Prepared high level analysis reports with Excel and Tableau that provided feedback on the quality ofDataincluding identification of patterns and outliers.
  • Used Advanced Microsoft Excel techniques such as Pivot tables, VLOOKUP to create intuitive dashboards in Excel
  • Parsed XML files, JSON documents to load thedatafrom them into database by scripting inpython.
  • Extensively usedpythonmodules like numpy, pandas, datetime and SQL alchemy to performdataanalysis.
  • Managed storage in AWS using S3, created volumes and configured snapshots.
  • Created EC2 instances to run automatedpythonscripts. Automated EC2 instances using AWS cloud formation templates.
  • Used MS Excel in Importing and exportingdatafrom text files, saved queries or databases.
  • Created pivot tables and charts using MS Excel worksheetdataand external resources, modified pivot tables, sorted items and groupdata, and refreshed and formatted pivot table
  • Created Tableau scorecards, dashboards using stack bars, bar graphs, scattered plots, geographical maps and Gantt charts.

Confidential

ETL Data Analyst

Responsibilities:

  • Analyzedataand implementation ofdataanalysis using standard SQL and Tableau
  • Experienced in Manipulating, cleansing & processingdatausing Excel, Access and SQL.
  • Experience in DBMS/RDBMS implementation using object-oriented concept and database toolkit.
  • Experience indataanalysis and ETL Techniques for loading high volumes ofdata and smooth structural flow of the data
  • Experience in creating Tableau dashboards and interactive reports. Createdvisualizations like side by side bars, Scatter Plots, Stacked Bars, Heat Maps, Filled Maps and Symbol Maps in Tableau.
  • Worked on integration and implementation of projects, creating and modelling databases. Developed and Maintained Database using SQL
  • Design and analyze A/B tests to drive KPI improvements and develop rich dashboards using business intelligence (and BI technologies) tools such as OLAP,Datawarehousing, reporting and querying tools,Datamining and Spreadsheets.
  • Build and manage models to forecast key company metrics and to better understand the underlying drivers of their performance of Sales division fromdata
  • Gathered, Cleansed and AnalyzedData, performed Statistical Analysis such as Regression, andDataValidation using MS Excel

We'd love your feedback!