Big-data Engineer/ Hadoop Developer Resume
Cincinatti, OH
SUMMARY:
- 5+ years of IT experience in various industries with 4 years of hands on experience in developing Big - data and Hadoop applications.
- Have strong technical foundation with in-depth knowledge in Big Data Hadoop, Data Reporting, Data Design, Data Analysis, Data governance, Data integration and Data quality.
- Experience in setting, configuring and monitoring of Hadoop cluster of Cloudera, Hortonworks distribution.
- Deep and extensive knowledge with HDFS, Spark, Apache Nifi, MapReduce, Pig, Hive, HBase, Sqoop, Storm, Yarn, Flume, Oozie, Zookeeper, Cassandra, MongoDBetc.
- Thorough knowledge on Hadoop architecture and various components such as HDFS, Name Node, Data Node, Application Master, Resource Manager, Node Manager, Job Tracker, Task Tracker and MapReduce programming paradigm.
- Good understanding on Hadoop MR1 and MR2 (YARN) Architecture.
- Experience in analyzing data using HIVEQL, PIG Latin and Map Reduce programs in JAVA.
- Expertise in writing Map Reduce Programs and UDFs for both HIVE and PIG in JAVA.Extended HIVE and PIG core functionality by using custom UDF's.
- Experience in developing scalable solutions using NoSQL databases including HBASE, CASSANDRA, MongoDB and Couch DB.
- Extracted files from NoSQL database like Couch DB, HBase through Flume and placed in HDFS for processing.
- Efficient in working with Hive data warehouse tool creating tables, data distributing by implementing Partitioning and Bucketing strategy, writing and optimizing the HiveQL queries.
- Experienced in performing analytics on structured data using Hive queries, operations, Joins, tuning queries, SerDe's and UDF.
- Good experience working with different Hadoop file formats like Sequence File, RCFile, ORC, AVRO and Parquet.
- Experience in using modern Big-Data tools like SparkSQL to convert schema-less data into more structured files for further analysis.
- Experience in Spark Streaming to receive real time data and store the stream data into HDFS.
- Experienced in building Storm topologies, spouts, boults to stream data from sources, pre-process data.
- Extensive experienced in working with different Spark modules like Spark transformations, Mlib, Graphx, Streaming and Spark QL.
- Good experience in writing Map Reduce jobs using Java native code, Pig, Hive for various business use cases.
- Experience in processing data serialization formats like Xml, JSON and Sequence Files.
- Experience in working with Apache Sqoop to import and export data to and from HDFS and Hive.
- Good working experience in designing Oozie workflows for cleaning data and storing into Hive tables for quick analysis.
- Good knowledge streaming data using Flume and Kafka from multiple sources into HDFS.
- Knowledge of processing and analyzing real-time data streams/flows using Kafka and HBase.
- Experience with Informatica Power Center Big Data Edition (BDE) for high-speed Data Ingestion and Extraction.
- Hands on experience with Amazon EMR, Cloudera (CDH4 & CDH5), and Horton Works Hadoop Distributions.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, Spark, Kafka, NIFI, MapReduce, Pig, Hive, Impala, HBase, Elastic search, Cassandra, Sqoop, Oozie, Zookeeper, Flume, Storm, YARN, MongoDB, Ranger, Mahout, Falcon, Avro, AWS.
Java & J2EE Technologies: Core Java, Hibernate, spring, JSP, Servlets, Java Beans, JDBC, EJB 3.0, JDBC, JMS, JMX, RMI.
IDE Tools: Eclipse, IntelliJ.
Programming languages: Java, Python, Scala, C, C++, MATLAB, SAS, PHP, SQL, PL/SQL.
Web Services & Technologies: XML, HTML, XHTML, JNDI, HTML5, AJAX, JQuery, JSON, CSS, JavaScript, AngularJS, VB Script, WSDL, SOAP and RESTful.
ETL tools: Pentaho, Talend, Informatica (MDM, IDQ, TPT), Teradata.
Databases: Oracle, SQL Server, MySQL, DB2, NoSQL.
Application Servers: Apache Tomcat, WebLogic, WebSphere, JBoss.
Tools: Maven, SBT, ANT, JUNIT, log4J.
Operating Systems: Windows, UNIX, Linux, Mac OS.
PROFESSIONAL EXPERIENCE:
Confidential, Cincinatti, OH
Big-Data Engineer/ Hadoop developer
Responsibilities:
- Installed and configured Apache Hadoopclusters using YARN for application development and Apache toolkits like Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka and Sqoop.
- Developed and deployed successfully many modules on Spark, Hive, Sqoop, Shell, Pig, Scala and Python.
- Successfully launched data transfer between Databases and HDFS with Sqoop, and used Flume in parallel to stream the log data from servers.
- Modified Hive and SQL queries to Spark using Spark RDDs and Scala, python.
- Designed and deployed multiple POCs using Scala and Yarn cluster, and checked the Performance of Spark, withCassandraand SQL.
- Involved in data loading from UNIX file system to HDFS.
- Generated Sqoop scripts for data ingestioninto Hadoopenvironment.
- Implemented Spark API over YARN to achieve data analytics in Hive DB.
- Created and scheduled multiple tasks for incremental load into staging tables.
- Loaded the log data and data from UI apps into Hadoop lake using Apache Kafka service
- Transformed data and performed data quality checks before loading onto HDFS with Pig.
- Created Hive External tables in partitioned format to load the processed data obtained from MapReduce.
- Operated analytical algorithms on HDFS data using MapReduce programs
- Merged data from different sources using Hive joins and performed Adhoc queries.
- Designed Hive Generic UDFs to perform record level business logic operations.
- Implemented Data classification algorithms using MapReduce design patterns.
- Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of MapReduce jobs.
- Designed a workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and used Zookeeper to coordinate the clusters.
- Successfully Handled Different File Formats like Text, Avro, Parquet file formats, snappy, bz2, gzip compression.
- Implemented test scripts to support test driven development and continuous integration.
- Gained experience with NOSQL databases like Hbase, Cassandra.
- Troubleshooting the cluster by reviewingHadoopLOGfiles.
- Worked on multiple data formats on HDFS using Spark
- Used Zookeeper for various types of centralized configurations.
Environment: Hadoop 2.3.0, Spark core, Cassandra, SparkSql, SparkR, PySpark, Hive, Pig, Sqoop, Zookeeper, Control-M, Java, and UNIX Shell Scripting.
Confidential, NewYork
Hadoop DeveloperResponsibilities:
- Workflow to export Cassandra column family data to CSV, loaded data to pig. Avro Data Serialization system to work with JSON data formats.
- Created and maintained Technical documentation for launching Hadoop.
- Clusters and for executing PigScripts.
- Involved in managing deployments using xml scripts.
- Developed Spark SQL scripts and involved in converting hive UDF's to Spark SQL UDF's.
- Performed operations on data stored in HDFS and other NoSQL databases in both batch-oriented and ad-hoc contexts.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing. Involved in loading data from LINUX file system to HDFS system.
- Running batch processes using Pig Scripts and developed Pig UDFs for data manipulation per Business Requirements.
- Accessing Hive tables to perform analytics from java applications using JDBC.
- Used Partitioning pattern in Map Reduce to move records into categories.
- Commissioning and Decommissioning nodes to Hadoop Cluster.
- Imported/exported data from RDBMS to HDFS using Data Ingestion tools like Sqoop.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Pig Scripts.
- Testing - Unit testing through JUNIT & Integration testing in staging environment.
- Followed Agile & Scrum principles in developing the project.
- Integration, code review refining of code done by team members. Mentoring team members and solving critical technical issues faced by team members.
- Environments: Hadoop, Map Reduce, HDFS, Pig, Hive, Java, Cloudera Distribution, Cassandra, Java, HTML, JavaScript, XML, XSLT, JQuery, AJAX, Web Services.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, Jira, Python, SQL, Cloudera Manager, Spark, AWS, Cassandra, Pig, Sqoop, Oozie, ZooKeeper, Storm, Flume, Azkaban, Solr, Talend Open Studio, Teradata, Scala, PL/SQL, MySQL, NoSQL, ElasticSearch, Windows, Horton works, HBase
ConfidentialHadoop Developer
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Installed cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity and configured Hadoop, MapReduce, HDFS, developed multiple MapReduce jobs in JAVA for data cleaning.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Worked on installing planning, and slots configuration.
- Implemented NameNode backup using NFS. This was done for High availability.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Worked on NoSQL databases including HBase, Monod, and Cassandra.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: Hadoop, NoSQL, Cassandra, Monod, Sqoop, HDFS, HBase,Oozie, PIG Latin, Hive, Flume, MapReduce, JAVA, Eclipse, NetBeans.
Confidential
Java Developer
Responsibilities:
- Actively involved in Analysis, Detail Design, Development, System Testing and User Acceptance Testing.
- Developing Intranet Web Application using J2EE architecture, using JSP to design the user interfaces, and JSP tag libraries to define custom tags and JDBC for database connectivity.
- Implemented struts framework (MVC): developed Action Servlet, Action Form bean, configured the struts-config descriptor, implemented validator framework.
- Extensively involved in database designing work with Oracle Database and building the application in J2EE Architecture.
- Integrated messaging with MQSERIES classes for JMS, which provides XML message Based interface. In this application publish-and-subscribe model of JMS is used.
- Developed the EJB-Session Bean that acts as Facade, will be able to access the business entities through their local home interfaces.
- Evaluated and worked with EJB's Container Managed Persistent strategy.
- Used Web services - WSDL and SOAP for getting Loan information from third party and used SAX and DOM XML parsers for data retrieval
- Experienced in writing the DTD for document exchange XML. Generating, parsing and displaying the XML in various formats using XSLT and CSS.
- Used SVN version controlling system for the source code and project management.
- Used XPath 1.0 for selecting nodes and XQuery to extract and manipulate data from XML documents.
- Coding, testing and deploying the web application using RAD 7.0 and WebSphere Application Server 6.0.
- Used JavaScript's for validating client side data.
- Wrote unit tests for the implemented bean code using JUnit.
- Extensively worked on UNIX Environment.
- Data is exchanged in XML format, which helps in interoperability with other software applications.
Environment: Struts 2, JMS, EJB, JSP, RAD 7.0, WebSphere Application Server 6.0,XML parsers, XSLT XQueryXPath 1.0, HTML, CSS, JavaScript, IBM MQSeries, JBoss, ANT, JUnit, SVN, JDBC, Oracle, Unix, SVN.
