Big Data Anlayst Resume
2.00/5 (Submit Your Rating)
Chicago, IllinoiS
SUMMARY
- 5 years of professional IT experience in analysis, design, development, and implementation of business applications with strong knowledge in, Python, Big Data/Hadoop Ecosystem, and RDBMS related technologies.
- Master’s graduate with ability to define strategic data initiatives that meet long term goals, make an impact, and iteratively execute and deliver. “ITIL Certified”
- A natural learner and experienced data lead with excellent communications, mentoring abilities, and passion to provide good solutions.
- Skilled Python programmer and have experience using Pandas, Numpy, Matplotlib, TensorFlow, and Keras.
- Domain exposure in, Health Care, Insurance and Financial systems, which includes 4 years of experience in Big Data ecosystem related technologies like HDFS, Map Reduce, Pig, Hive, Spark, Apache Kafka, Oozie and Zookeeper
- In - depth knowledge of Hadoop architecture and its components like YARN, HDFS, Name Node, Data Node, Job Tracker, Application Master, Task Tracker and MapReduce programming paradigm. (Batch processing & stream oriented)
- 3+ Experience in building Hadoop architecture based ETL workflows to ingest, transform and aggregate data using Spark and Hive.
- 2+ Experience in data integration using Cloudera Hadoop distribution, Amazon EC2, Amazon S3, Amazon EMR.
- 3+ Experienced in analyzing data by using Hive query language, Pig, and Map Reduce.
- 3+ Experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper for Production data support.
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, Spark, and Sqoop, Storm, Kafka and HBase in a both FDD & TDD environment.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
- Extensive knowledge on ingestion of unstructured data into the HDFS using Apache NiFi, and Flume and structured data using Sqoop.
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, Spark, and Sqoop, Apache Kafka and HBase.
- Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture Hands on experience in in-memory data processing with Apache Spark with Scala and Developed analytical components using Scala, Spark, Spark-SQL, and Spark Streaming.
- Building massively scalable multi-threaded applications for large data processing primarily with Apache Spark with Scala on Hadoop.
- Experienced in analyzing data by using HiveQL, Pig, and Map Reduce.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
TECHNICAL SKILLS
Languages: HQL, SQL, Scala, Python
Database: MySQL, HBase, Cassandra, MongoDB.
Hadoop Ecosystem: Hive, Scala, Spark, Sqoop, Oozie, Zookeeper, Kafka, YARN, Map Reduce, Pig.
Reporting tools: Tableau, Advance MS Excel.
Operating Systems: Windows, UNIX, Cent OS, Mac OS.
PROFESSIONAL EXPERIENCE
BIG DATA ANLAYST
Confidential, Chicago, Illinois
Responsibilities:
- Worked on Big Data Integration and Analytics based on Hadoop, Hive and NOSQL Database.
- Experienced in performing data profiling and data integration on source system data to determine the actual content, structure, and quality of data
- Worked with highly unstructured and semi structured large data sets to aggregate and report on it.
- Migrating the needed data for Big Data integration from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
- Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured and unstructured data.
- Built real-time streaming data pipelines with Apache Kafka, Spark streaming and Cassandra.
- Implemented Streaming application to monitor sales control of items in all the stores across US using Apache Kafka Design, development, and implementation.
- Automating data flow between the software systems using Apache NiFi.
- Actively participated in interaction with business to fully understand the requirement of the system.
- Migrated an existing on-premises application to AWS.
- Transferring of the data using Informatica tool from AWS S3 to AWS Redshift.
- Designed and Implement test environment on AWS.
Assistant Data ENGINEER
Confidential, St louis, MO
Responsibilities:
- Designing, developing, and maintaining new generation machine learning based big-data web page categorization, data/IP mining, and malicious site detection systems. Work with real-time data processing, messaging, streaming techniques, and workflows
- Processed and analyzed cloud data using Big Data Hadoop 2.0 models (Hive, PIG) and written UDFs for implementing custom functions
- Imported and recruited datasets from MDM using Sqoop from the RDBMS for data processing using MapReduce.
- Migrating Hive queries (including UDFs) to be compatible with the latest version of Spark and conducting data transformations using CQL.
- Crafted algorithms using Python to filter data collection results and worked on PySpark projects as part of Confidential &D department
- Capable of processing large sets of structured, semi-structured and unstructured data sets and Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Developed Scala scripts, UDF’s using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Managed large datasets using Panda data frames and MySQL.
- Paired up and built complex data pipelines by ingesting and transforming unstructured petabyte scale datasets that enable faster and better data analytics.
Data Engineer
Confidential
Responsibilities:
- Led the APM tower and controlled the cross functional team to build an application monitoring system for clients.
- Performed POC (Proof of concept) by analyzing MDM data insights from all competitive channels with SQL complex manipulation and compile the To-BE recommendations for new system
- Automated ETL Data marts processes pipelines, making it easier to wrangle data and reducing time by as much as 40% to create and complete an ETL operations.
- Responsible to manage data coming from different sources and validate them.
- Responsible for loading data from UNIX file systems to HDFS
- Created Hive external tables for each source table in the Data Lake and Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Actively participated in interaction with business to fully understand the requirement of the system.
- Responsible for performing various transformations like sort, join, aggregations, filter in- order to retrieve various datasets.
- Involved in creating Hive Tables, loading with data, and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Optimized queries in Hive to increase performance and query execution time.
- Involved in writing Flume and Hive scripts to extract, transform, and load the data into Database.
- Managed large datasets using Pandas data frames and MySQL
- Built complex data pipelines by ingesting and transforming unstructured petabyte scale datasets that enable faster and better data analytics.
- Proven ability in administrating Hadoop clusters by configuring HDFS name and data nodes.
- Processed and analyzed data using Big Data Hadoop 2.0 models (Hive, PIG) and written UDFs for implementing custom functions on a medium scale cluster network.
- Re-designed active support post implementation by working with the Engineering team to define schema and ETL processes for data warehouse as per business needs.
- Trained and constructed a Real Time Service Intelligence Dashboard for Data Visualization for proactive monitoring of the critical applications using BMC BEM and Tableau.
- Very skilled Excel Specialist has strong knowledge of and experience with utilizing PIVOT & VLOOKUP sorting and filtering.