We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

North Kansas City, MO

SUMMARY:

  • Extensive experience in the IT industry with expertise in Big Data
  • Worked in the healthcare & marketing industries
  • Worked on several Hadoop distribution platforms, including Cloudera CDH and Hortonworks HDP
  • Experienced in programming languages, such as Java, Python, Scala, and SQL
  • Proficient in utilizing Big Data Ecosystem including Hadoop, HDFS, YARN, MapReduce, Spark, Hive, Impala, HBase, Sqoop, Flume, Kafka, Oozie and Zookeeper
  • Experienced with real - time data processing mechanism in Big Data Ecosystem such as Apache Kafka and Spark Streaming
  • Proficient in programming Scala to analyze large datasets using Spark Streaming
  • Experienced in writing HiveQL, and developing Hive UDFs in Java to process and analyze data
  • Adept at using Sqoop to migrate data between RDBMS, NoSQL and HDFS
  • Worked with RDBMS including MySQL & Oracle SQL
  • Worked with NoSQL databases including HBase, and Cassandra
  • Performed data visualization with Tableau
  • Worked with Windows & Linux operating systems for development
  • Excellent knowledge of Linux/Unix Shell Commands
  • Excellent knowledge of Unit Testing with Pytest, ScalaCheck, ScalaTest, JUnit and MRUnit
  • Familiar with software development techniques such as Agile/Scrum and Waterfall
  • Involved in building, evolving and reporting framework on top of the Hadoop cluster to facilitate data mining, analytics and dash-boarding
  • Support a wide variety of ad-hoc data needs
  • Strong ability to prepare and present data in an easy-to-understand and visually appealing manner
  • Build high volume real-time data processing applications using Hadoop platform
  • Experienced in Cloud platforms such as Amazon Web Service(AWS)
  • Experience working in large scale Databases like Oracle 11g, DB2, XML,and MS Excel
  • Demonstrated ability to communicate and gather requirements, partner with enterprise architects, business users, analysts and development teams to deliver rapid iterations of complex solutions
  • Excellent teamwork skills, communication and leadership skills

TECHNICAL SKILLS:

Programming Languages Operating Systems: Java 8, Python 2.7/3.7, Scala, R, MATLAB, \ Linux, Windows, and MacOS\ and Arduino IDE\

Tools: Big Data Ecosystem: Tableau, Plotly, Microsoft Office (Word, \ Apache Hadoop 2.5, Spark 1.6/2.3, MapReduce, Excel with macros, PowerPoint), Putty, and \ Hive, HDFS, Kafka, Pig, Oozie, Sqoop \

Database Technologies Web Technologies: MySQL, Microsoft Access Hive, HBase, \ HTML 5, CSS, JavaScript\ and Cassandra\

PROFESSIONAL EXPERIENCE:

Confidential, North Kansas City, MO

Big Data Engineer

Responsibilities:

  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Load and transform large sets of structured, and semi structured data
  • Utilized SparkSQL to extract and process data by parsing, using Datasets or RDDs in HiveContext, with transformations and actions (map, flatMap, filter, reduce, reduceByKey)
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using Spark framework
  • Monitored and tuned Spark jobs running on the cluster
  • Worked on Flume for efficiently collecting, aggregating and moving large amounts of data
  • Installed and configured Zookeeper for Hadoop cluster
  • Worked on setting up high availability for cluster and designed automatic failover using Zookeeper
  • Worked with application teams to install Hadoop updates, patches, version upgrades and operating system as required

Environment: Hadoop, HDFS, Spark 1.6.2, Flume 1.5.0, Zookeeper, Cloudera, MySQL 5.6, Putty, Eclipse

Confidential, Melbourne, FL

Project Owner / Developer

Responsibilities:

  • Developed a processing pipeline including transformations, estimations, evaluation of analytical models
  • Performed pre-processing on a dataset prior to training, including, standardization, and normalization
  • Built models by implementing a recurrent neural network and training it on the dataset
  • Evaluated model accuracy by dividing data into training and test datasets and computing metrics using evaluators
  • Tuned training hyper-parameters by integrating cross-validation into pipelines
  • Troubleshot and tuned machine learning algorithm in Spark

Environment: Spark 1.6.2, Spark Mllib, Spark ML, Hive 1.2.1, Flume 1.5.0, HBase 1.1.4, MySQL 5.6, Scala, Shell Scripting, Tableau 9.2

Confidential, Melbourne, FL

Data Analyst

Responsibilities:

  • Developed data pipeline using Kafka, Hive and Spark to ingest data into HDFS for analysis
  • Involved in creating Hive Tables, loading them with data and writing hive queries for data analysis
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources
  • Explored Spark for improving the performance and optimization of the existing algorithms in Hadoop
  • Import the data from different sources like HDFS/HBase into Spark RDD
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala
  • Developed functional programs in Scala for connecting the streaming data application and gathering web data
  • Configured connection between Hive and Tableau using Impala for BI development tool

Environment: Java 8, Scala, Apache Hadoop 2.5, Kafka, Spark, Hive, HDFS, YARN, MySQL 5.7, Tableau, and Microsoft Excel 2016,

Confidential, Melbourne, FL

IT Engineer

Responsibilities:

  • Managed project to improve total user activity within mobile application from 65% to 75% by designing gamification model for integrating elements such as VIP perks, points, badges, user competitions and leadership boards
  • Developed complete end to end big data processing in hadoop ecosystem
  • Queried user engagement data from RDBMS (MySQL) to csv files for each month
  • Preprocessed user data to track missing or unrelated sets of data
  • Wrote hive queries for data analysis and filtering out the required data for further processing
  • Visualized the data using Tableau software and Microsoft Excel
  • Hands on experience in exporting the results into relational databases using Tableau for visualization and to generate reports for the BI team
  • Communicate deliverables to BI team at periodic review meetings

Environment: Java 8, Apache Hadoop 2.5, Apache Sqoop, Hive, HDFS, MySQL 5.6, Tableau, and Microsoft Excel 2015

Confidential, Grand Rapids, MI

Data Analyst

Responsibilities:

  • Handled extraction of data from different databases and transferring into HDFS using Sqoop
  • Loaded and transformed large sets of structured, and semi structured data
  • Performed transformations using Hive and loaded data into HDFS for aggregations
  • Utilized SparkSQL to extract and process data
  • Enhance and optimize product Spark code to aggregate, group and run data mining tasks using Spark framework
  • Built dynamic dashboards in spreadsheets within tableau

Environment: Java 8, Apache Hadoop 2.5, Apache Sqoop, Hive, HDFS, MySQL 5.6, Tableau, and Microsoft Excel 2014

We'd love your feedback!