Big Data Developer Resume
Minneapolis, MN
SUMMARY:
- Around 8+ years of Information Technology experience with 4+ years of experience in Hadoop.
- Proficient with Hadoop Ecosystem Components like HDFS, Map Reduce, yarn, pig, hive, sqoop, flume, HBase.
- Experienced in migrating data from HDFS to relational database and vice versa using SQOOP
- Experience with different data formats like Json, Avro, parquet, RC and ORC and compressions like snappy, Gzip.
- Worked on the RDBMS system using PL/SQL to create packages, procedures, functions, triggers as per the business requirements.
- Worked on writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
- Good knowledge in using NiFi to automate the data movement between Hadoop systems.
- Developed Batch Processing jobs using Map Reduce, Pig and Hive.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
- Used Oozie and Control - M workflow engine for managing and scheduling Hadoop Jobs.
- Worked on AWS cloud services like S3, EMR and EC2.
- Knowledge in Spark and Scala, mainly in framework exploration for transition from Hadoop/MapReduce to Spark.
- Worked on Kafka application to produce near real time data using Apache kafka Connect Framework.
- Exposure on usage of apache Kafka to develop data pipeline of logs as a stream of messages using producers and consumers.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Utilize Kafka and flume to gain real-time and near-real time streaming data in HDFS from different sources.
- Followed Test driven development of Agile and scrum Methodology to produce high quality software.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Experienced in working with Hadoop/Big-Data storage and analytical frameworks over Amazon AWS cloud.
- Strong SQL, PL/SQL ETL knowledge. Having good experience in writing, testing and implementation of the triggers, stored procedures, functions, packages at database level and form level using PL/SQL and Involved in performance tuning.
- Worked with BI tools like Tableau for report creation and further analysis.
- Skilled in using Zookeeper to provide coordination services to cluster.
- Designed and developed automation test scripts using Python.
- Worked on Oozie for managing Hadoop jobs.
- Hands on experience on Hortonworks and Cloudera Hadoop environments.
- Good Knowledge on Machine learning algorithms like supervised, non-supervised
- Self-motivated, responsible and proper time management with good Written, Verbal and Listening skill, commitment to co-operative teamwork.
- Good team player with reputation of integrity and an ability to work in multiple areas.
TECHNICAL SKILLS:
Big Data Ecosystem: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Hbase, Kafka, Oozie, Spark, Zookeeper, NiFi
Hadoop Technologies and Distributions: Apache Hadoop, Yarn, Cloudera CDH3, CDH4.
Operating System: Linux (Centos, Ubuntu), Windows( XP/7/8/10)
Languages: Java, C/C++, Shell scripting, Pig Latin, Scala, Python.
Databases: MySql,Teradata,DB2,Oracle
NoSQL: Hbase, Cassandra and Mongo DB
IDE Tools: Eclipse, NetBeans
Web Development: HTML, XML, JavaScript, Servlets, Jsp.
Application Servers: Apache Tomcat, JDBC, ODBC
BI Tools: Power BI, Tableau, Talend
PROFESSIONAL EXPERIENCE:
Confidential, Minneapolis, MN
Big Data Developer
Responsibilities:
- Involved in working with data extracted from two different sources like MYSQL, Web Servers.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa for visualization and to generate reports.
- Responsible for creating Hive External tables and loaded the data in to tables and query data using HQL.
- Worked on importing metadata into Hive using Sqoop and migrated existing tables and applications to work on Hive.
- Handled importing data from various data sources, performed transformations using Hive, SparkSQL and loaded data into HDFS.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, HBase.
- Developed several new MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Used Kafka HDFS connector to export data from Kafka topic to HDFS files in variety of formats. Used Zookeeper as built coordinator between different brokers in Kafka.
- Scheduled Spark streaming jobs using Oozie for continuously tracking the jobs.
- Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
- Exploring with Spark, improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Worked on AWS Cloud on various AWS services like S3, IAM, EMR and Service Catalogue.
- Experience in AWS cost optimization. Created a Lambda Function to auto terminate the ideal cluster based on Metrics.
- Extensively used ODI to load data from Oracle, XML files and flat files
- Used Oracle Data Integrator ODI to develop processes for extracting, cleansing, transforming, integrating, and loading data into data warehouse database
- Responsible for running Hadoop streaming jobs to process terabytes of xml's data, utilized cluster co-ordination services through Zookeeper.
- Worked closely with Agile development team to develop continuous integration/continuous delivery in the delivery of product in an open source environment using tools such as Jenkins.
- Build a CI/CD pipeline to automate the process using the python script.
- Created visual trends and calculations in Tableau on customers and products data as per client requirement.
Environment: HDFS, Hadoop Map Reduce, Hive, Pig, Sqoop, RDBMS, HBase, Zoo Keeper, Shell Scripting, Spark, Scala, Kafka, MongoDB.
Confidential, Denver, CO
Hadoop Developer
Responsibilities:
- Involved in data loading from various servers to HDFS
- Load and transform large sets of structured, semi structured and unstructured data.
- Used SQOOP to import customer information data from MySQL database into HDFS for data processing
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Wrote MapReduce jobs to discover trends in data usage by users.
- Optimized Hive queries to extract the customer information from HDFS
- Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
- Experience in managing and reviewing Hadoop log files.
- Worked on creating program/application to perform transformation on the given Datasets using Spark with Scala.
- Implemented partitioning, bucketing in Hive for better organization of the data.
- Used SerDes in Hive for converting JSON format data in CSV format for Loading into tables.
- Designed workflows by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
- Developed Pig scripts and UDF's as per the Business logic.
- Experience in cluster coordination using Zookeeper.
- Developing scripts and batch jobs to schedule various Hadoop Programs.
- Streaming of data was continuously scheduled and monitored by Oozie.
- Fault tolerance in the presence of machine failure using streaming tool.
- Reporting the data to analysts for further tracking of trends according to various consumers.
Environment: Sqoop, MapReduce, Spark Pig, Hive, Oozie, Zookeeper, Java, Shell scripting
Confidential, Omaha, NE
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
- Wrote MapReduce jobs to generate reports for the number of activities created on a day, dumped from the multiple sources and the output was written back to HDFS.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
- Developed Pig Latin scripts to aggregate the log files of the business clients.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Involved in writing shell scripts in scheduling and automation of tasks.
- Involved in ETL, Data Integration and Migration. Imported data using SQOOP to load data from Oracle to HDFS on regular basis.
- Wrote backend code in Java to interact with the database using JDBC.
- Helped and directed testing team to get up to speed on Hadoop Data testing.
- Coordinated with testing team for bug fixes and created documentation for recorded data, agent usage and release cycle notes.
Environment: MapReduce, Flume, Pig, Hive, Sqoop, Oozie, Java, MySQL, Shell Scripting.
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in the analysis, design, implementation, and testing of the project.
- Implemented the presentation layer with HTML, XHTML and JavaScript.
- Developed web components using JSP, Servlets and JDBC.
- Implemented database using SQL Server.
- Worked on designing the content and delivering the solutions based on understanding the requirements.
- Efficiently dealt with exceptions and flow control.
- Worked on Object Oriented Programming concepts.
- Involved in the designing of the Application, and various design patterns.
- Design of MySQL database to store billing details.
- Used Oracle as Database and involved in writing SQL scripts, PL/ SQL code for procedures and functions.
- Involved in fixing bugs and unit testing with test cases using Junit.
- Used Log4J to print the logging, debugging, warning, info on the server console.
- Used Eclipse for writing code and CVS for version control.
Environment: Java, JSP, Servlets, JDBC, JavaScript, MySQL, Junit, Eclipse IDE.