Hadoop Developer Resume
Chicago, IL
PROFESSIONAL SUMMARY:
- 8+ years of overall experience in IT industry, which includes hands on experience in Big data Eco - system related technologies like Map Reduce, Hive, HBase, Pig, Scoop, Flume, Oozie and HDFS.
- Experienced in developing and implementing MapReduce programs using Hadoop to work with Big Data requirement.
- Hands on Experience in Big Data ingestion tools like Flume and Sqoop.
- Worked Extensively on Sqoop to import and export data from RDBMS and vice-versa.
- Worked with assorted flavors of Hadoop distributions such as Cloudera and Hortonworks.
- Experienced with CDH distribution and Cloudera Manager to manage and monitor to Hadoop clusters.
- Experience in working with different kinds of data files such as XML, JSON, Parquet, Avro and Databases.
- Hands on NoSQL database experience with Apache HBase and MongoDB.
- Knowledge in job workflow scheduling and coordinating tools like Oozie and Zookeeper.
- Good knowledge on Apache Spark and Scala.
- Good knowledge of Hadoop architecture and various components such as HDFS Framework, Job Tracker, Task Tracker, Name Node, Data Node, MRV1 and MRV2(YARN).
- Experience with various scripting languages like Linux/Unix shell scripts, Python 2.7 and Scala.
- Involved in configuration, development Hadoop Environment on AWS cloud such as Lambda, S3, EC2, EMR (Electronic MapReduce), Redshift, Cloud watch.
- Experienced in implementing Spark RDD transformation actions to implement business analysis.
- Used Flume to collect, aggregate and store the web log data onto HDFS.
- Used Zookeeper for various types of centralized configurations.
- Extensive knowledge and experience on real time data streaming techniques like Kafka, Storm and Spark Streaming.
- Experience in analyzing data using HIVEQL and Pig Latin and custom MapReduce programs in Java.
- Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Knowledge on handling Hive quires using spark SQL that integrate with spark environment implemented in Scala.
- Hands on experience with message broker such as Apache Kafka.
- Experience in loading data using Hive and writing scripts for data transformations using Hive and Pig.
- Experience in creating Impala views on hive tables for fast access to data.
- Developed UDF functions and implemented it in Hive queries.
- Developed PIG Latin scripts for handling business transformations.
- Comprehensive knowledge of Software Development Life Cycle(SDLC), having through understanding of various phases like Requirements Analysis, Design, Development and testing.
- Strong Analytical and problem-solving skills, Multi- Tasking abilities, with proven experience in utilizing people and process knowledge to assist enterprises in making critical condition.
TECHNICAL SKILLS:
Hadoop Ecosystem Development: HDFS, Map-Reduce, Hive, Pig, Impala, Scala, Spark, Oozie, Flume, Sqoop, Kafka, HBase, Zookeeper.
Hadoop Distribution System: Cloudera, Hortonworks.
Databases (SQL &NoSQL): Oracle, MS SQL, My SQL, Teradata, PL-SQL, T-SQL, DB2, HBase, Mongo DB, Cassandra.
Languages: Java, Python, C/C++, Scala, SAS, R.
Scripting Languages: Core Java, Pig Latin, Angular JS, Python 2.7 & Scala.
Operating Systems: Linux, Unix, Windows.
Reporting Tools: Pentaho, Tableau, BIRT.
PROFESSIONAL EXPERIENCE:
Hadoop Developer
Confidential, Chicago, IL
Responsibilities:
- Working in an Agile team to deliver and support required business objectives by using Java, Python and shell scripting and other related technologies to acquire, ingest, transform and publish data both to and from Hadoop Ecosystem.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Worked extensively on Apache NiFi to build data flows for the existing Oozie jobs to get the incremental load, Full load, Semi-structured data and to get data from rest API into Hadoop and automate all the NiFi flows runs incrementally.
- Used Flume to collect, aggregate and store the web log data onto HDFS.
- Performed Data Cleansing using Python and loaded into the target tables.
- Logical implementation and interaction with HBASE.
- Used Scala to store streaming data to HDFS and to implement Spark for faster processing of data.
- Integrating user data from Cassandra to HDFS. Integrating Cassandra with Storm for real time user attributes look up.
- Performed Sqoop Incremental imports by using Oozie based on every day.
- Installed and configured Hadoop MapReduce, HDFS, developed MapReduce jobs in Java for data cleaning and pre-processing.
- Involved in using HCATALOG to access Hive table metadata from MapReduce or Pig code.
- Created Pig scripts to transform the HDFS data and loaded the data into Hive external table.
- Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Connectors, Spark core, Spark SQL, Sqoop, Pig, Hive and NoSQL databases.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Performed Optimizations of Hive Queries using Map side joins, dynamic partitions and Bucketing.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Performed advanced procedure like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Implemented Spark RDD transformations, actions to implement business analysis.
- Connected to HDFS using Pentaho Kettle to read data from hive tables and perform analysis.
- Worked on Spark Streaming and Spark SQL to run sophisticated applications on Hadoop.
- Used Oozie and Oozie coordinators to deploy end to end processing pipelines and scheduling the work flows.
- Worked on concept of quorum with Kafka and Zookeeper.
- Created NiFi flows to trigger spark jobs and used put email processors to get notifications, if there are any failures.
- Deployed reports on Pentaho BI Server to give central web access to the users.
- Created several dashboards in Pentaho using Pentaho Dashboard Designer.
- Created and maintained Technical documentation for launching Hadoop Clusters and executing pig Script.
Environment: Hadoop, CDH 4, CDH 5, Scala, MapReduce, HDFS, Hive, Pig, Sqoop, HBASE, Flume, Spark SQL, Spark-Streaming, MapR, NiFi, Pentaho, Python, UNIX Shell Scripting and Cassandra.
Hadoop developer
Confidential, San Jose, CA
Responsibilities:
- Worked on a live 24 node Hadoop cluster running on HDP 2.2
- Importing and exporting data job’s, to perform operations like copying data from HDFS and to HDFS using Sqoop.
- Worked with Sqoop jobs with incremental load to populate HAWQ External tables to Internal table.
- Created external and internal tables using HAWQ.
- Worked with Spark core, Spark Streaming and spark SQL modules of Spark.
- Hands on experience in various Bigdata application phases like data ingestion, data analytics and data visualization.
- Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP.
- Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
- Very well versed in workflow scheduling and monitoring tools such as Oozie, Hue and Zookeeper.
- Experience in working with Flume to load the log data from multiple sources directly into HDFS.
- Installed and configured MapReduce, HIVE and the HDFS, implemented CDH5 and HDP clusters.
- Assisted with performance tuning, monitoring and troubleshooting.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Experience in manipulating the streaming data to clusters through Kafka and Spark- Streaming.
- Optimized Hive QL/pig scripts by using execution engine like TEZ, Spark.
- Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
- Experienced in reviewing Hadoop log files to delete failures.
- Performed benchmarking of the NoSQL databases, Cassandra and HBASE streams.
- Worked with Pig, HBASE, NoSQL database HBASE and Sqoop. For analyzing the Hadoop cluster as well as big data.
- Knowledge of workflow/schedulers like Oozie/crontab/Autosys.
- Very good understanding of partitions, bucketing concepts in Hive and designed both Managed and External tabled in Hive to optimize performance.
- Creating Hive tables and working on them for data analysis to meet the business requirements.
- Developed a data pipeline using Spark and Hive to ingest, transform and analyzing data.
- Experience in using Sequence files, RC file, AVRO and HAR file formats.
- Hands on Experience writing PIG Scripts to Tokenized sensitive information using PROTEGRITY.
- Used FLUME to dump the application server logs into HDFS.
- Automating backups by shell for Linux to transfer data in S3 bucket.
- Experience in UNIX Shell scripting.
- Hands on experience using HP ALM. Created test cases and uploaded into HP ALM.
- Automated incremental loads to load data into production cluster.
Environment: Hadoop, MapReduce, AWS, HDFS, Hive, HBASE, Sqoop, Pig, Flume, Oracle, Teradata, PL/SQL, Java, Shell Scripting, HP ALM.
Hadoop developer
Confidential, Atlanta, GA
Responsibilities:
- Ingested Click-Stream data from FTP servers and S3 buckets using custom Input Adapters.
- Designed and developed Spark jobs to enrich the click stream data.
- Implemented Spark jobs using Scala, used Spark SQL to access hive tables into spark for faster processing of data.
- Involved in performance tuning of Spark jobs using cache and using complete advantage of cluster environment.
- Worked with Data-science team to gather requirements for data mining projects.
- Developed Kafka Producer and Spark Streaming consumer for working with Live Click Stream feeds.
- Worked on different file formats (PARQUET,TEXTFILE) and different compression codecs(GZIP,SNAPPY,LZO).
- Written complex Hive queries involving external dynamic partitioned on Hive tables which stores rolling window time-period user viewing history.
- Worked with data science team to build various predictive models with Spark MLLIB.
- Experience in troubleshooting various Spark applications using spark-shell, spark-submit.
- Good experience in writing Map Reduce programs in Java on MRv2/YARN environment.
- Developed java code to generate, compare and merge Avro schema files.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and python.
- Designed and developed External and Managed Hive Tables with data formats such as Text, Avro, Sequence File, RC, ORC, Parquet.
- Implemented Spark RDD transformations, actions to migrate Map Reduce algorithms.
- Implemented Sqoop job to perform import/incremental import of data from any relational tables into Hadoop in different formats such as text, Avro and Sequence into Hive table.
- Developed ETL scripts for Data acquisition and Transformation using Talend.
- Good hands on experience in writing HQL statements as per the requirement.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Involved in designing and developing tables in HBase and storing aggregated data from Hive table.
- Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud AWS S3 and used Elastic Map Reduce(EMR) to run a Map Reduce.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Used Impala and Tableau to create various reporting dashboards.
Environment: Spark, Hive, Impala, Sqoop, HBase, Tableau, Scala, Talend, Eclipse, YARN, Oozie, Java, Cloudera, Unix.
Java Developer
Confidential
Responsibilities:
- Involved in Analysis, design and coding on JAVA/JSP Front End Environment.
- Responsible for developing use cases, class and sequence diagram for the modules using UML and Rational Rose Enterprise edition 2000 as a Feature owner.
- Developed application using Spring, Servlets, JSP and EJB.
- Implemented MVC (Model View Controller) architecture.
- Designed the Application flow using Rational Rose.
- Used web servers like Apache Tomcat.
- Implemented Application prototype using HTML, CSS and JavaScript.
- Developed the user interfaces with the spring tag libraries.
- Developed, build and deployment scripts using Apache ANT to customize WAR, EAR and EJB jar files.
- Prepared field validation and on-scenario test cases using Junit and testing of the module in 3 phases named unit testing and system using testing and regression testing.
- Code and unit test according to client standards.
- Used Oracle8i Database for data storage and coding stored procedures, functions and triggers.
- Wrote DB queries using SQL for interacting with database.
- Design and develop XML processing components for dynamic menus on the application.
- Created Components using JAVA, Spring and JNDI.
- Prepared Spring deployment descriptors using XML.
- Developed the entire application using Eclipse and deployed them on Web Sphere Application Server.
- Problem Management during QA, Implementation and Post- Production Support.
- Developed a logging component using Apache Log4J to log messages and errors and wrote test cases to verify the code for different conditions using Junit.
Environment: Java, HTML, Spring, JSP, Servlets, DBMS, Web Services, JNDI, JDBC, Eclipse, Web sphere, XML/XSL, Apache Tomcat, TOAD, Oracle, MySQL, JUNIT, Log4j, SQL, PL/SQL, CSS.
SQL Developer
Confidential
Responsibilities:
- Developed SQL Scripts to perform different joins, Sub queries, nested querying, Insert/Update and Created and modifies existing stored procedures, triggers, views and indexes.
- Responsible in maintaining databases.
- Performed intermediate queries using SQL, including Inner/Outer/Left Joins and Union/Intersect.
- Responsible in implementing and monitoring database systems.
- Designed and modifies physical databases with development teams.
- Worked with Business Analysts and Users to understand the requirement.
- Responsible for designing the advanced SQL queries, procedure, cursor, triggers.
- Build data connection to the database using MS SQL Server.
- Worked on project to extract data from xml file to SQL table and generate data file reporting using SQL Server 2008.
- Created Drill-through, Drill-down, Cross Tab Reports, Cached reports and Snapshot Report to give all the details of various transactions like closed transactions, pending approvals and summary of transactions and scheduled this report to run on monthly basis.
- Created reports and designed graphical representation of analyzed data using reporting tools.
Environment: MS SQL Server 2008/2005, SQL Server Integration Services 2008, SQL Server Analysis Services 2008, MS Visual, Windows 2003/2000.