Big Data Engineer Resume
SUMMARY
- Over 7+ years of IT Experience and 6 years of Big Data technology Experience.
- Experience working on Big Data ecosystems and experienced in ingestion, storage, querying, processing and analysis of big data.
- Strong hands on experience in developing batch jobs using MapReduce, Spark SQL and developing real - time data ingestion pipeline using Kafka and Storm.
- Experience writing custom MapReduce programs in Java using Apache Crunch.
- Experience wif different Big data storage like HDFS, HP Vertica, HBase
- Experience wif writing UDF in Hive.
- Knowledge of using CSV files, AVRO, JSON, Parquet file formats.
- Experience in scheduling batch jobs using Oozie.
- Experience in working in data visualization using Tableau.
- Experience in analyzing application and system logs using Splunk.
- Strong software development background in functional and object-oriented programming.
- Excellent understanding Agile and Scrum development methodology.
- Good noledge of Data warehousing concepts and ETL.
- Experience Working wif AWS ecosystem.
- Knowledge on Data Analytics, Predictive Analysis, Statistical modelling and Machine Learning concepts.
- Worked as both Data Scientist and Big Data Engineer along the career
- Also have entrepreneurial experience and other soft skills
TECHNICAL SKILLS
Big Data Technologies: MapReduce, Spark, Pig, Hive, Hbase, HDFS, Hp Vertica, Avro, Parquet, Sqoop, Kafka, Storm, Oozie, chef, Mahout, Splunk, Hadoop, Apache Crunch
Programming Language: Java, Python, C, C++, C#, R, Scala
Software Tools: Visual Studio 2010, Eclipse, MATLAB, Eclipse, IntelliJ, Jenkins, Tableau, SAP Business Objects
RDMS: MySQL, Microsoft SQL Server, Oracle
Cloud Computing: Amazon Web Services covering resources like S3, EC2, Route53
PROFESSIONAL EXPERIENCE
Confidential
Big Data Engineer
Responsibilities:
- Processed millions of Healthcare data each day in near real time and maintained big data platform to process it
- Developed data models in Apache Avro framework and maintained ETL mappings and transformations.
- Analyzed various types of machine and application logs using Splunk and troubleshot in case of issues.
- Wrote cookbooks and recipes using Apache Chef to manage the resources.
- Developed multiple MapReduce jobs using Apache Crunch for daily incremental/historical data processing.
- Deployed Oozie coordinators to run various MapReduce jobs for different clients.
- Monitored the health of the scheduled ETL runs for over a hundred clients and fixed issues
- Was responsible for importing, loading, analyzing, transforming and storing data in HDFS and HP Vertica.
- Wrote various Hive queries to analyze the data coming from various sources and to troubleshoot.
- Lambda architecture was used to process the data in both Batch and Real-time processing.
- Used Kafka to generate notification for the data coming from various sources.
- Created Storm topologies to process the data in real time and loaded data in HBase tables.
- Supported various web analytics solution built on Hadoop, Oozie, Vertica, Tableau and SAP Business Objects.
- Worked closely wif data science team to coordinate wif the requirement of data for various Machine learning models.
- Worked in Agile and Test Driven Development Environment.
- Used Cloudera distribution and its ecosystem for the entire project.
- Mentored engineers and interns as well as documented new or updated processes to facilitate noledge sharing.
Confidential
Big Data Engineer
Responsibilities:
- Coordinated wif the UI team on the different data requirements.
- Worked closely wif data science team to coordinate wif the requirement of data for various Machine learning models.
- Worked on data gathering from other existing Confidential solutions and clients.
- Worked closely wif crawler team to crawl data from various databases.
- Worked on ingesting the client data into our cloud.
- Worked on normalization and standardization of data.
- Used Kafka to generate notification for the data coming from various sources.
- Created Storm topologies to process the data in real time and loaded data in HBase tables.
- Created various data mapping for processing data from different sources for ETL processing.
- Worked in Agile and Test Driven Development Environment.
- Used Cloudera distribution and its ecosystem for the entire project.
Confidential
Data Scientist
Responsibilities:
- Used machine learning methods to predict accidents using train sensor data, weather data and schedule data which halped railroad to avoid accidents.
- Worked on designing a predictive model which can predict accidents happening, given multiple variables (example: Train Type, Day of week, time, engineer’s age, etc.)
- Data mining algorithm used for frequent alert patterns and implemented using RHadoop.
- Wrote various Hive queries to extract and transform data for various variables from different tables.
- Developed custom User Defined Function (UDF) in Hive to transform the large volumes of data wif respect to business requirement.
- Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R
Confidential
Assistant Systems Engineer
Responsibilities:
- Planned and executed all phases of the software lifecycle including, requirements gathering, design, development and testing.
- Managed the creation of function and technical design documentation.
- Worked on POC of sending notification for various department via automated email.
Confidential
Assistant Systems Engineer
Responsibilities:
- Worked wif Support team of 30 people on applications like SMS Gateway, Content Protection and Mobile Device Configurator.
- Worked on BMC Remedy ITSM tool in support areas of Incident, Problem, Change, Release and Configuration Management.
- Managed the ticketing system wif relevant updates after troubleshooting and resolving application issues.
- Was responsible for overseeing the deployment and support of the application.
- Interacted wif software application developers and customer service teams to clarify design specifications, test requirements and address defect resolutions.
