We provide IT Staff Augmentation Services!

Big Data Developer Resume

4.00/5 (Submit Your Rating)

Minneapolis, MN

SUMMARY:

  • Around 8+ years of Information Technology experience with 4+ years of experience in Hadoop.
  • Proficient with Hadoop Ecosystem Components like HDFS, Map Reduce, yarn, pig, hive, sqoop, flume, HBase.
  • Experienced in migrating data from HDFS to relational database and vice versa using SQOOP
  • Experience with different data formats like Json, Avro, parquet, RC and ORC and compressions like snappy, Gzip.
  • Worked on the RDBMS system using PL/SQL to create packages, procedures, functions, triggers as per the business requirements.
  • Worked on writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
  • Good knowledge in using NiFi to automate the data movement between Hadoop systems.
  • Developed Batch Processing jobs using Map Reduce, Pig and Hive.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
  • Used Oozie and Control - M workflow engine for managing and scheduling Hadoop Jobs.
  • Worked on AWS cloud services like S3, EMR and EC2.
  • Knowledge in Spark and Scala, mainly in framework exploration for transition from Hadoop/MapReduce to Spark.
  • Worked on Kafka application to produce near real time data using Apache kafka Connect Framework.
  • Exposure on usage of apache Kafka to develop data pipeline of logs as a stream of messages using producers and consumers.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Utilize Kafka and flume to gain real-time and near-real time streaming data in HDFS from different sources.
  • Followed Test driven development of Agile and scrum Methodology to produce high quality software.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Experienced in working with Hadoop/Big-Data storage and analytical frameworks over Amazon AWS cloud.
  • Strong SQL, PL/SQL ETL knowledge. Having good experience in writing, testing and implementation of the triggers, stored procedures, functions, packages at database level and form level using PL/SQL and Involved in performance tuning.
  • Worked with BI tools like Tableau for report creation and further analysis.
  • Skilled in using Zookeeper to provide coordination services to cluster.
  • Designed and developed automation test scripts using Python.
  • Worked on Oozie for managing Hadoop jobs.
  • Hands on experience on Hortonworks and Cloudera Hadoop environments.
  • Good Knowledge on Machine learning algorithms like supervised, non-supervised
  • Self-motivated, responsible and proper time management with good Written, Verbal and Listening skill, commitment to co-operative teamwork.
  • Good team player with reputation of integrity and an ability to work in multiple areas.

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Hbase, Kafka, Oozie, Spark, Zookeeper, NiFi

Hadoop Technologies and Distributions: Apache Hadoop, Yarn, Cloudera CDH3, CDH4.

Operating System: Linux (Centos, Ubuntu), Windows( XP/7/8/10)

Languages: Java, C/C++, Shell scripting, Pig Latin, Scala, Python.

Databases: MySql,Teradata,DB2,Oracle

NoSQL: Hbase, Cassandra and Mongo DB

IDE Tools: Eclipse, NetBeans

Web Development: HTML, XML, JavaScript, Servlets, Jsp.

Application Servers: Apache Tomcat, JDBC, ODBC

BI Tools: Power BI, Tableau, Talend

PROFESSIONAL EXPERIENCE:

Confidential, Minneapolis, MN

Big Data Developer

Responsibilities:

  • Involved in working with data extracted from two different sources like MYSQL, Web Servers.
  • Used Sqoop to import and export data from HDFS to RDBMS and vice-versa for visualization and to generate reports.
  • Responsible for creating Hive External tables and loaded the data in to tables and query data using HQL.
  • Worked on importing metadata into Hive using Sqoop and migrated existing tables and applications to work on Hive.
  • Handled importing data from various data sources, performed transformations using Hive, SparkSQL and loaded data into HDFS.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, HBase.
  • Developed several new MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Used Kafka HDFS connector to export data from Kafka topic to HDFS files in variety of formats. Used Zookeeper as built coordinator between different brokers in Kafka.
  • Scheduled Spark streaming jobs using Oozie for continuously tracking the jobs.
  • Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
  • Exploring with Spark, improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Worked on AWS Cloud on various AWS services like S3, IAM, EMR and Service Catalogue.
  • Experience in AWS cost optimization. Created a Lambda Function to auto terminate the ideal cluster based on Metrics.
  • Extensively used ODI to load data from Oracle, XML files and flat files
  • Used Oracle Data Integrator ODI to develop processes for extracting, cleansing, transforming, integrating, and loading data into data warehouse database
  • Responsible for running Hadoop streaming jobs to process terabytes of xml's data, utilized cluster co-ordination services through Zookeeper.
  • Worked closely with Agile development team to develop continuous integration/continuous delivery in the delivery of product in an open source environment using tools such as Jenkins.
  • Build a CI/CD pipeline to automate the process using the python script.
  • Created visual trends and calculations in Tableau on customers and products data as per client requirement.

Environment: HDFS, Hadoop Map Reduce, Hive, Pig, Sqoop, RDBMS, HBase, Zoo Keeper, Shell Scripting, Spark, Scala, Kafka, MongoDB.

Confidential, Denver, CO

Hadoop Developer

Responsibilities:

  • Involved in data loading from various servers to HDFS
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Used SQOOP to import customer information data from MySQL database into HDFS for data processing
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Optimized Hive queries to extract the customer information from HDFS
  • Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • Experience in managing and reviewing Hadoop log files.
  • Worked on creating program/application to perform transformation on the given Datasets using Spark with Scala.
  • Implemented partitioning, bucketing in Hive for better organization of the data.
  • Used SerDes in Hive for converting JSON format data in CSV format for Loading into tables.
  • Designed workflows by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
  • Developed Pig scripts and UDF's as per the Business logic.
  • Experience in cluster coordination using Zookeeper.
  • Developing scripts and batch jobs to schedule various Hadoop Programs.
  • Streaming of data was continuously scheduled and monitored by Oozie.
  • Fault tolerance in the presence of machine failure using streaming tool.
  • Reporting the data to analysts for further tracking of trends according to various consumers.

Environment: Sqoop, MapReduce, Spark Pig, Hive, Oozie, Zookeeper, Java, Shell scripting

Confidential, Omaha, NE

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
  • Wrote MapReduce jobs to generate reports for the number of activities created on a day, dumped from the multiple sources and the output was written back to HDFS.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
  • Developed Pig Latin scripts to aggregate the log files of the business clients.
  • Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
  • Involved in writing shell scripts in scheduling and automation of tasks.
  • Involved in ETL, Data Integration and Migration. Imported data using SQOOP to load data from Oracle to HDFS on regular basis.
  • Wrote backend code in Java to interact with the database using JDBC.
  • Helped and directed testing team to get up to speed on Hadoop Data testing.
  • Coordinated with testing team for bug fixes and created documentation for recorded data, agent usage and release cycle notes.

Environment: MapReduce, Flume, Pig, Hive, Sqoop, Oozie, Java, MySQL, Shell Scripting.

Confidential

Java/J2EE Developer

Responsibilities:

  • Involved in the analysis, design, implementation, and testing of the project.
  • Implemented the presentation layer with HTML, XHTML and JavaScript.
  • Developed web components using JSP, Servlets and JDBC.
  • Implemented database using SQL Server.
  • Worked on designing the content and delivering the solutions based on understanding the requirements.
  • Efficiently dealt with exceptions and flow control.
  • Worked on Object Oriented Programming concepts.
  • Involved in the designing of the Application, and various design patterns.
  • Design of MySQL database to store billing details.
  • Used Oracle as Database and involved in writing SQL scripts, PL/ SQL code for procedures and functions.
  • Involved in fixing bugs and unit testing with test cases using Junit.
  • Used Log4J to print the logging, debugging, warning, info on the server console.
  • Used Eclipse for writing code and CVS for version control.

Environment: Java, JSP, Servlets, JDBC, JavaScript, MySQL, Junit, Eclipse IDE.

We'd love your feedback!