Hadoop Developer Resume
Minneapolis, MN
PROFESSIONAL SUMMARY:
- Around 8 years of experience in IT industry with 4+ years of experience in Big Data along with /J2EE.
- Experience in Hadoop Distributions: Cloudera and Hortonworks,
- Performed importing and exporting data into HDFS and Hive using Sqoop.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Extensive knowledge in using SQL Queries for backend database analysis.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice - versa.
- Led many Data Analysis& Integration efforts involving HADOOP along with ETL.
- Extensive experience with SQL, PL/SQL and database concepts.
- Transferred bulk data from RDBMS systems like Teradata, Netezza into HDFS using Sqoop.
- Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
- Well-versed in Agile, other SDLC methodologies and can coordinate with owners and SMEs.
- Worked on different operating systems like UNIX, Linux, and Windows
- Good working experience in using Spark, Spark Core,Spark-SQL, Spark-Streaming, Spark-Core, Apache Hadoop eco system components like MapReduce, HDFS, Hive, Sqoop, Pig, Oozie, Flume, HBase and Zoo Keeper.
- Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, Enterprise Edition (Java EE), Enterprise Java Bean (EJB), JavaServer Pages (JSP), Java Servlets (including JNDI), Struts, and Java database Connectivity (JDBC) technologies.
- Fluid understanding of multiple programming languages, including C#, C, C++,JavaScript, HTML, and XML.
- Experience in web application design using open source MVC , Spring and Struts Frameworks.
- Worked on setting up Apache NiFi and performing dataflows using NiFi in orchestratingdata pipeline activities.
- Strong experience in NOSQL columnar databases like HBase, Cassandra and its integration with Hadoop cluster.
- Writing UDFs and integrating with Hive and Pig.
- Experience with Sequence files, AVRO and ORC file formats and compression.
- Hands on experience on enterprise data lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing, structured and unstructured data.
TECHNICAL SKILLS:
Bigdata Core Services: Spark (Core, Sql, Streaming, ML), HDFS, Map Reduce, YARN
Hadoop Distribution: Horton works, Cloudera, Apache
NO SQL Databases: HBase, Cassandra, MongoDB
Hadoop Data Services: Hive, Impala, Pig, Sqoop, Flume
Hadoop Operational Services: Zookeeper, Oozie
Monitoring Tools: Cloudera Manager, Ambari, Nagios.
Cloud Computing Tools: Amazon AWS, EC2, S3, EMR
Languages: C, Java/J2EE, Python, SQL, PL/SQL, Pig Latin, HiveQL, Unix Shell Scripting
Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat
Databases: Oracle, MySQL, Postgress, Teradata, Netezza
Operating Systems: UNIX, Windows, LINUX
Build Tools: Jenkins, Maven, ANT
Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans
Development methodologies: Agile/Scrum
Visualization and analytics tool: Tableau Software, Qlik View
PROFESSIONAL EXPERIENCE:
Confidential, Minneapolis, MN
Hadoop Developer
Responsibilities:
- Optimizing Hadoop MapReduce code, Hive and Pig scripts for better scalability, reliability and performance.
- Developed the OOZIE workflows for the Application execution.
- Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
- Writing Pig scripts for data processing.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Implemented Hive tables and HQL Queries for the reports.
- Executed HiveQL in Spark using SparkSQL.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
- Involved in developing shell scripts and automated data management from end to end integration work
- Experience in performing data validation using HIVE dynamic partitioning and bucketing.
- Written and used complex data type in storing and retrieved data using HQL in Hive.
- Developed Hive queries to analyze reducer output data.
- Implemented ETL code to load data from multiple sources into HDFS using pig scripts.
- Highly involved in designing the next generation data architecture for the Unstructured data.
- Developed PIG Latin scripts to extract data from source system.
- Created and maintained technical documentation for executing Hive queries and Pig scripts.
- Involved in Extracting, loading Data from Hive to Load an RDBMS using Sqoop
Environment: HDFS, Map Reduce, MySQL, Spark, Cassandra, Hive, HBase, Oozie, PIG, ETL, Hortonworks(HDP 2.0), Shell Scripting, Linux, Sqoop, Flume and Oracle 11g, Maven, Ant, Junit, MRUnit, SVN, Jira.
Confidential, Nashville, TN
Hadoop Developer
Responsibilities:
- Importing and exporting data into HDFS and hive using Sqoop and Kafka with batch and streaming.
- Experienced with Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into HBase.
- Performance analysis of Spark streaming and batch jobs by using Sparktuning parameters.
- Enhanced and optimized Spark/Scala jobs to aggregate, group and run data mining tasks using the Spark framework.
- Installed, configured and developed various pipeline activities withNifiusing various processors such as Sqoop processor, Kafka processor, HDFS Processor, File Processorsetc.
- Created Data Pipelines as per the business requirements and scheduled it using Oozie.
- Used Hive to join multiple tables of a source system and load them to Elastic search tables.
- Involved in complete Big Data flow of the application data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data using several tools.
- Imported the data from various formats like JSON, Sequential, Text, CSV, AVRO and Parquet to HDFS cluster with compressed for optimization.
- Experienced on ingesting data from RDBMS sources like - Oracle, SQL Server and Teradata into HDFS using Sqoop.
- Configured Hive and written Hive UDF's and UDAF's Also, created partitions such as Static and Dynamic with bucketing.
- Experience in managing and reviewing huge Hadoop log files.
- Expertise in designing and creating various analytical reports and Automated Dashboards to help users to identify critical KPIs and facilitate strategic planning in the organization.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting
- Maintaining technical documentation for each and every step of development environment and launching Hadoop clusters.
- Worked on BI tools as Tableau to create dashboards like weekly, monthly, daily reports using tableau desktop and publish them to HDFS cluster.
Environment: Spark, Scala, HBase, Cassandra, Kafka, Nifi, Hadoop, HDFS, Hive, Oozie, Sqoop, Elastic Search, Shell Scripting, Python, Tableau, Oracle, MySQL, Teradata, Log4J, Junit, MRUnit, Jenkins, Maven, GIT, SVN, JIRA and AWS.
Confidential, Naples, FL
Big Data Developer
Responsibilities:
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Involved in data modeling and replication strategies in Cassandra.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Worked on handling Hive queries using Spark SQL that integrate Spark environment.
- Along with the Infrastructure team, involved in design and developed Kafka and Spark-Streaming based real-time data pipeline.
- Responsible for building scalable distributed data solutions using Hadoop
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis from various sources.
- Written multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV& other compressed file formats.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Involved in loading data from LINUX file system to HDFS.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
- Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Hands on experience in setting up HBaseColumn based storage repository for archiving and retro data.
- Responsible for creating Hive tables based on business requirements.
- Used Enterprise data lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing, structured and unstructured data.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Spark, Scala, Kafka, Cassandra, Flume, Apache Hadoop 2x, Cloudera, HDFS, MapReduce, Hive, Pig, HBase,Sqoop, Kafka, FLUME, Cassandra, Oracle 11g/10g, Linux, XML,MYSQL, Jenkins, Maven, GIT.
Confidential, Hartford, CT
Hadoop/Java Developer
Responsibilities:
- Involved in importing data from Microsoft SQLServer, MySQL, Teradata. into HDFS using Sqoop.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
- Used Hive to analyze the partitioned and bucked data to compute various metrices of reporting.
- Involved in creating Hive tables loading data, and writing queries that will run internally in MapReduce
- Involved in creating Hive External tables for HDFS data.
- Solved performance issues in Hive and Pig Scripts with understanding of Joins, Group and Aggregation and perform the MapReduce jobs.
- Used Spark for transformations, event joins and some aggregations before storing the data into HDFS.
- Troubleshoot and resolve data quality issues and maintain elevated level of data accuracy in the data being reported.
- Analyze the large amount of data sets to determine optimal way to aggregate.
- Worked on the Oozie workflow to run multiple Hive and Pig jobs.
- Involved in creating Hive UDF's.
- Developed Automated shell script to execute Hive Queries.
- Involved in processing ingested raw data using Apache Pig.
- Monitored continuously and managed the Hadoop cluster using Cloudera Manager.
- Worked on different file formats like JSON, AVRO, ORC, Parquet and Compression like Snappy, zlib, ls4 etc.
- Gained Knowledge in creating Tableau dashboard for reporting analyzed data.
- Expertise with NoSQL databases like HBase.
- Experienced in managing and reviewing the Hadoop log files.
- Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.
Environment: HDFS, MapReduce, Sqoop, Hive, Pig, Oozie, Cloudera Manager, MySQL, Eclipse, Git, GitHub, Jenkins.
Confidential, Ellicott City, MD
Java Developer
Responsibilities:
- Analyzing the business requirements and doing the GAP analysis then transforming them to detailed design specifications.
- Involved in design process using UML & RUP (Rational Unified Process).
- Performed Code Reviews and responsible for Design, Code and Test signoff.
- Assisting the team in development, clarifying on design issues and fixing the issues.
- Involved in designing test plans, test cases and overall Unit and Integration testing of system.
- Development of the logic for the Business tier using Session Beans (Stateful and Stateless).
- Developed Web Services using JAX-RPC, JAXP, WSDL, SOAP, XML to provide facility to obtain quote, receive updates to the quote, customer information, status updates and confirmations.
- Extensively used SQL queries, PL/SQL stored procedures & triggers in data retrieval and updating of information in the Oracle database using JDBC.
- Expert in writing, configuring and maintaining the Hibernate configuration files and writing and updating Hibernate mapping files for each Java object to be persisted.
- Expert in writing Hibernate Query Language (HQL) and Tuning the hibernate queries for better performance.
- Used the design patterns such as Session Façade, Command, Adapter, Business Delegate, Data Access Object, Value Object and Transfer Object.
- Deployed the application in Weblogic and used Weblogic Workshop for development and testing.
- Involved in application performance tuning (code refractory).
- Writing test cases using JUNIT, doing test first development.
- Writing build files using ANT. Used Maven in conjunction with ANT to manage build files.
Environment: EJB, Webservices, Hibernate, Struts, JSP, JMS, JNDI, JDBC, Weblogic, SQL, PL/SQL, Oracle, Sybase, XML, XSLT, WSDL, SOAP, UML, Rational Rose, Weblogic Workshop, OptimizeIt, Ant, JUnit, ClearCase, PVCS, ClearQuest, Win XP, Linux.