Big Data Engineer Resume
North Kansas City, MO
SUMMARY:
- Extensive experience in the IT industry with expertise in Big Data
- Worked in the healthcare & marketing industries
- Worked on several Hadoop distribution platforms, including Cloudera CDH and Hortonworks HDP
- Experienced in programming languages, such as Java, Python, Scala, and SQL
- Proficient in utilizing Big Data Ecosystem including Hadoop, HDFS, YARN, MapReduce, Spark, Hive, Impala, HBase, Sqoop, Flume, Kafka, Oozie and Zookeeper
- Experienced with real - time data processing mechanism in Big Data Ecosystem such as Apache Kafka and Spark Streaming
- Proficient in programming Scala to analyze large datasets using Spark Streaming
- Experienced in writing HiveQL, and developing Hive UDFs in Java to process and analyze data
- Adept at using Sqoop to migrate data between RDBMS, NoSQL and HDFS
- Worked with RDBMS including MySQL & Oracle SQL
- Worked with NoSQL databases including HBase, and Cassandra
- Performed data visualization with Tableau
- Worked with Windows & Linux operating systems for development
- Excellent knowledge of Linux/Unix Shell Commands
- Excellent knowledge of Unit Testing with Pytest, ScalaCheck, ScalaTest, JUnit and MRUnit
- Familiar with software development techniques such as Agile/Scrum and Waterfall
- Involved in building, evolving and reporting framework on top of the Hadoop cluster to facilitate data mining, analytics and dash-boarding
- Support a wide variety of ad-hoc data needs
- Strong ability to prepare and present data in an easy-to-understand and visually appealing manner
- Build high volume real-time data processing applications using Hadoop platform
- Experienced in Cloud platforms such as Amazon Web Service(AWS)
- Experience working in large scale Databases like Oracle 11g, DB2, XML,and MS Excel
- Demonstrated ability to communicate and gather requirements, partner with enterprise architects, business users, analysts and development teams to deliver rapid iterations of complex solutions
- Excellent teamwork skills, communication and leadership skills
TECHNICAL SKILLS:
Programming Languages Operating Systems: Java 8, Python 2.7/3.7, Scala, R, MATLAB, \ Linux, Windows, and MacOS\ and Arduino IDE\
Tools: Big Data Ecosystem: Tableau, Plotly, Microsoft Office (Word, \ Apache Hadoop 2.5, Spark 1.6/2.3, MapReduce, Excel with macros, PowerPoint), Putty, and \ Hive, HDFS, Kafka, Pig, Oozie, Sqoop \
Database Technologies Web Technologies: MySQL, Microsoft Access Hive, HBase, \ HTML 5, CSS, JavaScript\ and Cassandra\
PROFESSIONAL EXPERIENCE:
Confidential, North Kansas City, MO
Big Data Engineer
Responsibilities:
- Worked with the Data Science team to gather requirements for various data mining projects.
- Load and transform large sets of structured, and semi structured data
- Utilized SparkSQL to extract and process data by parsing, using Datasets or RDDs in HiveContext, with transformations and actions (map, flatMap, filter, reduce, reduceByKey)
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using Spark framework
- Monitored and tuned Spark jobs running on the cluster
- Worked on Flume for efficiently collecting, aggregating and moving large amounts of data
- Installed and configured Zookeeper for Hadoop cluster
- Worked on setting up high availability for cluster and designed automatic failover using Zookeeper
- Worked with application teams to install Hadoop updates, patches, version upgrades and operating system as required
Environment: Hadoop, HDFS, Spark 1.6.2, Flume 1.5.0, Zookeeper, Cloudera, MySQL 5.6, Putty, Eclipse
Confidential, Melbourne, FL
Project Owner / Developer
Responsibilities:
- Developed a processing pipeline including transformations, estimations, evaluation of analytical models
- Performed pre-processing on a dataset prior to training, including, standardization, and normalization
- Built models by implementing a recurrent neural network and training it on the dataset
- Evaluated model accuracy by dividing data into training and test datasets and computing metrics using evaluators
- Tuned training hyper-parameters by integrating cross-validation into pipelines
- Troubleshot and tuned machine learning algorithm in Spark
Environment: Spark 1.6.2, Spark Mllib, Spark ML, Hive 1.2.1, Flume 1.5.0, HBase 1.1.4, MySQL 5.6, Scala, Shell Scripting, Tableau 9.2
Confidential, Melbourne, FL
Data Analyst
Responsibilities:
- Developed data pipeline using Kafka, Hive and Spark to ingest data into HDFS for analysis
- Involved in creating Hive Tables, loading them with data and writing hive queries for data analysis
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources
- Explored Spark for improving the performance and optimization of the existing algorithms in Hadoop
- Import the data from different sources like HDFS/HBase into Spark RDD
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala
- Developed functional programs in Scala for connecting the streaming data application and gathering web data
- Configured connection between Hive and Tableau using Impala for BI development tool
Environment: Java 8, Scala, Apache Hadoop 2.5, Kafka, Spark, Hive, HDFS, YARN, MySQL 5.7, Tableau, and Microsoft Excel 2016,
Confidential, Melbourne, FL
IT Engineer
Responsibilities:
- Managed project to improve total user activity within mobile application from 65% to 75% by designing gamification model for integrating elements such as VIP perks, points, badges, user competitions and leadership boards
- Developed complete end to end big data processing in hadoop ecosystem
- Queried user engagement data from RDBMS (MySQL) to csv files for each month
- Preprocessed user data to track missing or unrelated sets of data
- Wrote hive queries for data analysis and filtering out the required data for further processing
- Visualized the data using Tableau software and Microsoft Excel
- Hands on experience in exporting the results into relational databases using Tableau for visualization and to generate reports for the BI team
- Communicate deliverables to BI team at periodic review meetings
Environment: Java 8, Apache Hadoop 2.5, Apache Sqoop, Hive, HDFS, MySQL 5.6, Tableau, and Microsoft Excel 2015
Confidential, Grand Rapids, MI
Data Analyst
Responsibilities:
- Handled extraction of data from different databases and transferring into HDFS using Sqoop
- Loaded and transformed large sets of structured, and semi structured data
- Performed transformations using Hive and loaded data into HDFS for aggregations
- Utilized SparkSQL to extract and process data
- Enhance and optimize product Spark code to aggregate, group and run data mining tasks using Spark framework
- Built dynamic dashboards in spreadsheets within tableau
Environment: Java 8, Apache Hadoop 2.5, Apache Sqoop, Hive, HDFS, MySQL 5.6, Tableau, and Microsoft Excel 2014