Sr Hadoop Developer Resume
Kansas City, MissourI
PROFESSIONAL SUMMARY:
- Around 9 years of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of critical software and big data applications.
- 5+ years of hands on experience across Hadoop and that includes extensive experience into Big Data technologies.
- Hands on experience in developing and deploying enterprise based applications using major Hadoop ecosystem components like Map Reduce, Hive, Pig, Hbase, Flume, YARN, Sqoop, Spark Streaming, Spark SQL, Storm, Kafka, Oozie, Cassandra and Zookeeper.
- Hands on experience of multiple distributions like Cloudera, Hortonworks and Mapr.
- Experience in installation, configuration, supporting and managing Cloudera Hadoop platform along with CDH4 and CDH5 clusters.
- Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Experience in Automating Sqoop, Hive and Pig scripts using Oozie work flow scheduler.
- Hands on experience in using MapReduce programming model for Batch processing of data stored in HDFS.
- Good experience in optimizing MapReduce algorithms using Mappers, Reducers, combiners and partitioners.
- Good experience in Python and Shell scripting.
- Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
- Experience in using different file formats like CSV, Sequence, AVRO, RC, ORC, JSON and PARQUET files and different compression Techniques like LZO, Gzip, Bzip2 and Snappy.
- Experience in big data ingestion tools like Sqoop, Flume and Apache Kafka.
- Experience in using Flume and Kafka to load the log data from multiple sources into HDFS.
- Hands on experience with NoSQL Databases like Hbase, MongoDB and Cassandra.
- Experience in retrieving data from databases like MYSQL, Teradata, Informix, DB2 and Oracle into HDFS using Sqoop and ingesting them into Hbase and Cassandra.
- Good understanding and experience with Software Development methodologies like Agile and Waterfall.
TECHNICAL SKILLSET:
Big Data Ecosystem: Hadoop, Map Reduce, YARN, Pig, Hive, Hbase, Flume, Sqoop, Impala, Oozie, Zookeeper, Apache Spark, Kafka, Scala, MongoDB, Cassandra.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks and MapR.
No SQL Databases: Cassandra, MongoDB, Hbase, CouchDB.
Java: J2EE, JSP, CSS, Jquery, Servlets, HTML, Java Script
Mainframe: JCL, COBOL, CICS, DB2.
Databases: MYSQL, Oracle, DB2 for Mainframes, Teradata, Informix.
Operating Systems: Windows, Unix, Linux
Other Tools: Putty, WINSCP, Filezilla, Streamweaver, Compuset.
Languages: Java, SQL, HTML, JavaScript, JDBC, XML, and C.
Frameworks: Struts, spring, Hibernate.
App/Web servers: WebSphere, WebLogic, JBoss, Tomcat.
PROFESSIONAL EXPERIENCE:
SR HADOOP DEVELOPER
Confidential, Kansas City, Missouri
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Optimized Hive queries and used Hive on top of Spark engine.
- Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive .
- Populated HDFS and HBase with huge amounts of data using Apache Kafka .
- Used Kafka to ingest data into Spark engine.
- Hands on experience in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
- Developed Spark applications using Scala for easy Hadoop transitions.
- Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie .
- Experienced data pipelines using Kafka and Akka for handling large terabytes of data.
- Involved in integration of Hadoop cluster with Spark engine to perform BATCH and GRAPHX operations.
- Worked on indexes, scalability and query language supporting using Cassandra .
- Created Sqoop scripts for importing data from different data sources to Hive and Cassandra .
- Used HUE for running Hive queries and created partitions according to day using Hive to improve performance.
- Implemented Apache Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Developed a Data flow to pull the data from the REST API using Apache Nifi with context configuration enabled.
- Experience working on Talend ETL for performing data migration and data synchronization processes on the data warehouse.
- Installed and Configured Hadoop cluster using Amazon Web Services ( AWS ) for POC purposes.
- Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Responsible for maintaining and expanding AWS (Cloud Services) infrastructure using AWS.
- Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
- Written Python, Shell scripts for various deployments and automation process.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
- Visualized the analytical results using Tableau visualization tool.
- Developed R scripts to implement predictive analysis graphs in tableau.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
Environment: Apache Spark, Kafka, Map Reduce, Cassandra, YARN, Sqoop, Oozie, HDFS, Hive, Pig, Java, Hadoop distribution of Cloudera 5.4/5.5, Linux, XML, Eclipse, MySQL.
SR HADOOP DEVELOPER
Confidential - Chicago, Illinois
Responsibilities:
- Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Developed the ETL jobs to load the data into a data warehouse, which is coming from various data sources like Mainframes, flat file.
- Configuring of Hive, PIG, Impala, Sqoop, Flume and Oozie in Cloudera (CDH5).
- Experience in using Sqoop to import the data on to Cassandra tables from different relational databases and importing data from various sources to the Cassandra cluster using Java API's.
- Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios.
- Involved in creating data-models for customer data using Cassandra Query Language.
- Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Develop wrapper and utility automation scripts in Python.
- Good knowledge in using Apache NiFi to automate the data movement between different Hadoop systems.
- Write scripts to automate application deployments and configurations monitoring YARN.
- Written MapReduce programs in Python with the Hadoop streaming API.
- Involved in creating Hive tables and loading them with data and writing Hive queries.
- Migration of some ETL processes from Microsoft SQL Server to Hadoop utilizing Pig as data pipe line for easy data manipulation.
- Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Experience with AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates and Boto library.
- Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration.
- Build Reporting using Tableau integrating tool.
- Involved in importing data from Oracle tables to HDFS and Hbase tables using Sqoop.
- Developed scripts which will load the data into Spark RDD and do in memory data Computation to generate the output.
- Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDDs in Scala.
- Experience in Elastic search technologies in creating custom Solr Query components.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Worked on different data sources such as Oracle, Netezza, MySQL, Flat files etc.
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
- Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
- Developed Talend jobs to move inbound files to HDFS file location based on monthly, weekly, daily and hourly partitioning.
Environment: Cloudera, Map Reduce, Spark SQL, Spark Streaming, Pig, Hive, Flume, Hue, Oozie, Java, Eclipse, Zookeeper, Cassandra, Hbase, Talend, Github.
HADOOP DEVELOPER
Confidential - Milwaukee, Wisconsin
Responsibilities:
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, Hbase database and Sqoop.
- Experienced to implement Hortonworks distribution system (HDP 2.1, HDP 2.2 and HDP 2.3).
- Developed Map Reduce programs for some refined queries on big data.
- Experienced in working with Elastic MapReduce (EMR).
- Creating Hive tables and working on them for data analysis to cope up with the requirements.
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Worked with business team in creating Hive queries for ad hoc access.
- In depth understanding of Classic MapReduce and YARN architectures.
- Implemented Hive Generic UDF's to implement business logic.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Installed and configured Pig for ETL jobs.
- Developed Pig UDF’s to pre-process the data for analysis.
- Deployed Cloudera Hadoop Cluster on AWS for Big Data Analytics
- Analyzed the data by performing Hive queries, ran Pig scripts, Spark SQL and Spark Streaming.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Apache NiFi to copy the data from local file system to HDFS.
- Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
- Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
- Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
- Involved in continuous monitoring of operations using Storm.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented indexing for logs from Oozie to Elastic Search.
- Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend.
Environment: Hortonworks, Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Apache Kafka, Apache Storm, Oozie, SQL, Flume, Spark, Hbase, Cassandra, Informatica, Java, Github.
HADOOP DEVELOPER
Confidential - Chicago, IL
Responsibilities:
- Analyzed data using Hadoop Components Hive and Pig.
- Experienced in development using Cloudera distribution system.
- Worked Hands on with ETL process.
- Developed Hadoop Streaming jobs to ingest large amount of data.
- Load and transform large data sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Involved in doing POC’s for performance comparison of Spark SQL with Hive.
- Imported data using Sqoop from Teradata using Teradata connector.
- Created Sub-Queries for filtering and faster execution of data. Created multiple Join tables and fetched the required data.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Install and set up HBASE and Impala.
- Used Apache Impala to read, write and query the Hadoop data in HDFS, Hbase and Cassandra.
- Implemented Partitioning, Dynamic Partitions and Buckets in Hive.
- Supported Map Reduce Programs those are running on the cluster.
- Developed ETL test scripts based on technical specifications/Data design documents and Source to Target mappings.
- Configured Talend ETL on single and multi-server environments.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Bulk load the data into Oracle using JDBC template.
- Created Groovy scripts to load the CSV files into table into Oracle tables.
Environment: Cloudera, HDFS, Pig, Hive, Map Reduce, python, Sqoop, Storm, LINUX, Hbase, Impala, Java, SQL, Cassandra, MongoDB, SVN.
JAVA/HADOOP DEVELOPER
Confidential - Boston, Massachusetts
Responsibilities:
- Developed JSP, JSF and Servlets to dynamically generate HTML and display the data to the client side.
- Used Hibernate Framework for persistence onto oracle database.
- Written and debugged the ANT Scripts for building the entire web application.
- Developed web services in Java and Experienced with SOAP, WSDL and used WSDL to publish the services to another application.
- Implemented Java Message Services (JMS) using JMS API.
- Involved in managing and reviewing Hadoop log files.
- Installed and configured Hadoop, YARN, Map Reduce, Flume, HDFS, developed multiple Map Reduce jobs in Java for data cleaning.
- Coded Hadoop Map Reduce jobs for energy generation and PS.
- Coded using Servlets, SOAP Client and Apache CXF Rest API's for delivering the data from our application to external and internal for communication protocol.
- Worked on Cloudera distribution system for running Hadoop jobs on it.
- Expertise in writing Hadoop Jobs to analyze data using Map Reduce, Hive, Pig and Solr, Splunk.
- Created SOAP Web Service using JAX-WS, to enabled client to consume a SOAP Web Service.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
- Experienced in designing and developing multi-tier scalable applications using Java and J2EE Design Patterns.
Environment: MapR, Java, HTML, Java Script, SQL Server, PL/SQL, JSP, Spring, Hibernate, Web Services, SOAP, SOA, JSF, Java, JMS, Junit, Oracle, Eclipse, SVN, XML, CSS, Log4j, Ant, Apache Tomcat.
JAVA DEVELOPER
Confidential
Responsibilities:
- Involved in projects utilizing Java, Java EE web applications to create fully-integrated client management systems
- Developed UI using HTML, Java Script, JSP and developed business Logic and interfacing components using Business Objects, JDBC and XML.
- Participated in user requirement sessions to analysis and gather Business requirements.
- Development of user visible site using Perl, back end admin sites using Python and big data using core java.
- Involved in development of the application using Spring Web MVC and other components of the
- Elaborated Use Cases based on business requirements and was responsible for creation of class Diagrams, Sequence Diagrams.
- Implemented Object-relation mapping in the persistence layer using Hibernate(ORM) framework.
- Implemented REST Web Services with Jersey API to deal with customer requests
- Experienced in developing Restful web services: consumed and also produced.
- Used Hibernate for the Database connection and Hibernate Query Language (HQL) to add and retrieve the information from the Database.
- Implemented Spring JDBC for connecting Oracle database.
- Designed the application using MVC framework for easy maintainability
- Provided bug fixing and testing for existing web applications.
- Involved in full system life cycle and responsible for Developing, Testing, Implementing.
- Involved in Unit Testing, Integration Testing and System Testing.
- Implemented Form Beans and their Validations.
- Written Hibernate components.
- Developed client side validations with Java script.
Environment: Spring, JSP, Servlets, REST, Oracle, AJAX, Java Script, JQuery, Hibernate, WebLogic, Log4j, HTML, XML, CVS, Eclipse, SOAP Web Services, XSLT, XSD, UNIX, Maven, Mockito Junits, Jenkins, shell scripting, MVS, ISPF.