Big Data Engineer Resume
Des Moines, IA
SUMMARY:
- 8+ years of overall IT experience that including 3+ years of Big Data experience in data ingestion, storage, querying, processing and analysis.
- Around 4 years of Big Data experience in programming, data ingestion, storage, querying, processing and analysis.
- In depth understanding and knowledge of Hadoop Architecture and its components such as HDFS, Map Reduce, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager.
- Well versed in installing, configuring, administrating and tuning Hadoop cluster of major Hadoop distributions Cloudera CDH 3/4/5, Hortonworks HDP 2.3/2.4 and Confidential Web Services (AWS).
- Experience in Big Data technologies and Hadoop ecosystem projects like HDFS, Map Reduce, YARN, Spark, Hive, NoSQL databases, HBase, Oozie, Sqoop, Pig, Storm, Kafka, Impala, HCatalog, Zoo Keeper, Flume, Amazon Web Services.
- Hands on experience using YARN, and tools like Pig and Hive for data analysis, Sqoop for data ingestion, Oozie for scheduling, Zookeeper for coordinating cluster resources.
- Experienced in Confidential Web Services (AWS) Cloud services like EMR, EC2, S3, EBS and IAM roles for users.
- Solid knowledge in importing and exporting data from different databases like Oracle, MySql into HDFS and vice versa using Sqoop.
- Knowledge in writing Spark SQL Scripts, Mesos architecture and using Teradata.
- Integrate Kafka and Storm by using Avro for serializing and desterilizing the data and Kafka procedure and consumer.
- Experience in writing custom Map Reduce jobs and writing User Defined Functions (UDF) for evaluation, filtering, loading and storing data in both Hive and Pig.
- Experience in Collecting log data from various sources and integrating into HDFS using Flume and staging data in HDFS for further analysis.
- Experience in developing Oozie workflow for scheduling and orchestrating the ETL process.
- Hands on experience in working with NoSQL databases like HBase, Cassandra.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Solid understanding in building N - Tier Web Enabled Applications using Struts, Spring MVC, Hibernate, JSP, Servlets.
- Hands-on knowledge of REST and SOAP web services, XSD, WSDL, XML.
- Familiar with almost all the SDLCycles like agile, waterfall, spiral methodologies.
- Extensive in Object oriented concepts and great experience in both core java and J2EE technologies including JDBC, JSP, SERVLETS, Web Services, EJB, JNDI, JMS, XML, XSLT, AJAX and frameworks like Hibernate, Spring, Struts.
- Experience in client side Technologies such as HTML, DHTML, CSS, JavaScript, AJAX, jQuery, JSON.
- Familiar using building tools like ANT and MAVEN.
- Proficient in unit testing the application using JUnit and logging the application using Log4J.
- Experienced in deploying J2EE applications over servers like Apache Tomcat, JBoss 4.2.3 and Web Sphere 6.0 servers.
- Experience in writing SQL queries, Stored Procedures for accessing and managing databases such as Oracle 11g/10g/9ix, MS SQL Server 2012/2008/2005, MS Access.
- High level on hands on experience using version control tools like GIT, CVS, SVN, and Clear Case, using IDE’s like Eclipse, IntelliJ, NetBeans and worked with Linux Red Hat, Ubuntu and Windows Operating Systems.
- Outstanding analytical and technical problem solving skills and ability to learn and adapt quickly to the emerging new technologies and paradigms.
- Collaborated with technical team members to integrate back/front end issues.
TECHNICAL SKILLS:
Languages: C, C++, Java, J2EE, PL/SQL, C#, Python
Big Data: Ecosystem HDFS, MapReduce, YARN, Hive, HBase, Impala, Tez, Zookeeper, Sqoop, Pentaho, Oozie, Apache Cassandra, Flume, Spark, Splunk, AWS, Hcatolog, Kafka, Avro, Mesos, Teradata
Scripting language: PIG, Python, UNIX, LINUX
Hadoop Clusters: Cloudera CDH 4/5, Hortonworks HDP 2.3/2.4, Amazon Web Services (AWS)
J2EE Technologies: JDBC, Spring, Servlets, Struts, JSP, Java, Web Services using SOAP, REST, WSDL, HTML, JavaScript, JQuery, XML, XSL, XSD, JSON, CSS, Hibernate.
Application Servers: IBM Web Sphere 7.0, BEA Web Logic 8.1, JBoss 4.0.0, Oracle, Docker
Web Servers: Apache Tomcat 6.0, Jetty Web Server 2.0
J2EE Frameworks: Apache Struts, Hibernate, and Spring, AJAX.
Databases: MySQL, DB2, Oracle 11g/10g/9i, MS- SQL Server 2012/2008/2005, Mongo DB, Cassandra, Hbase, MS Access
IDE s: Eclipse, Net Beans, IntelliJ.
Development Tools: Maven, TOAD, SQL Workbench, Ant
Operating Systems: Linux (Redhat, CentOS, Ubuntu), UNIX, Mac OS, Sun Solaris and Windows
Version Control: SVN, CVS, Clear case, GIT
WORK EXPERIENCE:
Confidential, Des Moines, IA
Big Data Engineer
Responsibilities:
- Due to storage issue we have moved the data from Hortonworks cluster to AWS EMR cluster.
- Involved in running Hadoop jobs for processing millions of records and data gets updated on daily and weekly basis.
- Continuous data integration from Mainframe systems to Confidential S3 which is connected via Attunity an ETL tool.
- Documentation of the tasks and the issues is done.
- POC involved in loading data from LINUX file system to AWS S3 and HDFS.
- Worked on AWS to create, manage EC2 instances and Hadoop Clusters.
- Used Bedrock a data management tool to run the MapReduce jobs on top of RAW data and transforming the data to generate the desired output files.
- Developed shell scripting and used shell commands to automate the data flow on day to day processes.
- Created both internal and external tables in Hive and developed Pig scripts to preprocess the data for analysis.
- Monitoring Hadoop scripts which take the input from HDFS and load the data into Hive.
- Designed appropriate partitioning/bucketing schema to allow faster data retrieval during analysis using HIVE.
- Worked on various file formats like AVRO, ORC, Text, CSV, Parquet using Snappy compression.
- Created PIG scripts to process raw structured data and developed Hive tables on top of it.
- Developed Python, Scala scripts, UDFs using both Dataframes/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back to OLTP system directly or through Sqoop.
- Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL.
- Used TOAD to view the hive tables, views and the data.
- Involved in connecting Pentaho 7.0 to target database to get data.
- Used ETL to transfer the data from the target database to Pentaho to send it to reporting tool MicroStrategy.
- Used Zookeeper for various types of centralized configurations, GIT for version control, Maven as a build tool for deploying the code.
- Used collections to do Mapreduce programming
Environment: AWS, Hadoop 2.7.3-amzn-0, Hive 1.0.0, Pig 0.14.0, Hue 3.7.1, Spark 1.6.2, Python 2.7.12, Oozie-Sandbox 4.2.0, Tez 0.8.4, ZooKeeper-Sandbox 3.4.8, Sqoop-Sandbox 1.4.6, Scala 2.11.8, HCatalog 1.0.0, Linux 4.4.19-29, Bedrock 4.2.1, Pentaho 7.0, MariaDB, TOAD 1.3.0 Beta, Eclipse, Hortonworks, Java 1.8, Maven 3.3.9, GIT.
Confidential, Winston- Salem, NCHadoop Developer
Responsibilities:
- Interaction with the business users from the client side to discuss and understand ongoing enhancements and changes at the upstream business data and performing data analysis.
- Experienced on loading and transforming of large sets of structured and semi structured data from HDFS through Sqoop and placed in HDFS for further processing.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Involved in transforming data from legacy tables to HDFS, and HBase tables using Sqoop.
- Solved some computational problems by using YARN.
- Experience in writing Map Reduce programs and using Apache Hadoop API for analyzing the data.
- Analyzed the data by performing Hive queries and running Pig scripts to study data patterns.
- Created custom UDFs for Hive and Pig.
- Developed PIG scripts to arrange incoming data into suitable and structured data before piping it out for analysis.
- Involved in NoSQL (DataStax Cassandra) database design, integration and implementation.
- Created alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra.
- Integrating bulk data into Cassandra file system using MapReduce programs.
- Involved in creating data-models for customer data using Cassandra Query Language.
- Experienced in managing and reviewing Hadoop log files.
- Used Subversion for maintaining the component and for release and version management and JIRA for defect tracking.
- Worked on various file formats Avro, SerDe, Parquet, and Text by using gzip compression.
- Hands on experience with Spark streaming to receive real time data using Kafka.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
- Created Views, Sequences in the oracle database and modified the existing PL/SQL stored procedures.
- Experienced in importing real time logs to HDFS using Flume.
- Followed agile software development with Scrum methodology.
- Co-coordinating with, QA and infrastructure team during the development/testing phases for smooth rollout.
Environment: CDH 4, 5 Hadoop cluster, MapReduce, YARN, HDFS, Hive, Java, Spark Hbase, Kafka, Cassandra, PIG, Zookeeper, Sqoop, Ambari, Flume, Oracle, Oozie, JIRA, Eclipse, and CVS.
Confidential, Chicago, ILHadoop Developer
Responsibilities:
- Collected the business requirement from the subject matter experts like data scientists and business partners.
- Involved in Design and Development of technical specifications using Hadoop technologies.
- Load and transform large sets of structured, semi structured and unstructured data.
- Written multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats.
- Used different file formats like Text files, Sequence Files, Avro.
- Involved in writing HQL queries and Hive UDF’s for data analysis to meet the business requirements.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Load data from various data sources into HDFS using Kafka.
- Shell scripts to dump the data from MySQL to HDFS.
- Involved in NOSQL databases like HBase in implementing and integration.
- Worked on streaming the analyzed data to the HBase using SQOOP for making it available for visualization and report generation by the BI team.
- Involved in scheduling Oozie workflow engine to run multiple Hive, Sqoop and pig jobs.
- Consumed the data from Kafka queue using Storm
- Used Oozie to automate/schedule business workflows which invoke Sqoop, MapReduce and Pig jobs as per the requirements.
Environment: HDP 2.4 Hadoop Cluster, MapReduce, HDFS, Hive, Java, Impala, Hbase, AVRO, Kafka, PIG, Zookeeper, Sqoop, Flume, Kafka, Storm, Oozie, MySQL, Eclipse, and GIT.
Confidential, St. Louis, MOHadoop Developer
Responsibilities:
- Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
- Coordinated with business customers to gather business requirements.
- Analyzing business requirements and translating requirements into functional and technical design specifications.
- Used Oozie to automate/schedule business workflows which invoke Sqoop, MapReduce and Pig jobs as per the requirements.
- Wrote Scripts to generate Map Reduce jobs on the data in HDFS.
- Used Oozie to orchestrate the MapReduce jobs and worked with HCatalog to open up access to Hive's Metastore.
- Responsible for designing and managing the Sqoop jobs to imports data from Data warehouse platform to HDFS.
- Developed MapReduce programs using Java to perform various transformations, cleaning and scrubbing tasks.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Developed the Pig UDF's to pre-process the data for analysis.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Optimized Map/Reduce jobs to use HDFS efficiently by using various compression mechanisms.
- Involved in loading data from UNIX file system to HDFS.
- Developed Hive queries for data sampling and analysis to the analysts.
- Participated in end-to-end agile SDLC.
- Interfaced with business analysts, architects, developers, testers and clients in completing the project.
Environment: CDH 4, MapReduce, HDFS, Hive, Java, Flat files, Hbase, MongoDB, PIG, HCatolog, Cassandra, Zookeeper, Sqoop, Flume, Oozie, Agile Methodology, Oracle, Eclipse and SVN.
Confidential, Columbus, OHJava Developer
Responsibilities:
- Analyzing the design specs, business requirements and implemented the design using UML.
- Followed Waterfall development life cycle strategy for developing of the application.
- Generating Java classes from the respective APIs so that they can be incorporated in the overall application.
- Writing entities in Java along with named queries to interact with database.
- Used AngularJS as the development framework to build a single-page application.
- Responsible for client side UI Validation and implementing Business logic based on user selection using JQuery, JQuery UI and Angular JS.
- Used JBoss application server to deploy.
- Used JDBC and Hibernate for accessing Oracle database.
- Developed the build script for the project using Maven Framework.
- Developed server side code using Spring Dependency Injection.
- Navigation through various screens of application is implemented using Spring Web Flow.
- Used Hibernate to access the database using entity POJO classes.
- Created shell scripts and PL/SQL scripts that were executed daily to refresh data feeds from multiple systems.
- Used Spring Web Module integrated with Java Server Faces (JSF) Java/J2EE Developer.
- Developed pages using JSF features like converters, validators, actionListeners, custom components, etc.
- Involved in developing the REST based web services for various data formats including JSON.
- Implemented Log4J for logging purpose to debug the application.
- Used GIT tool for version control.
- Used PL/SQL stored procedures, triggers, cursors extensively to achieve effective results while dealing with huge transaction.
- Extensive experience in working with Struts and Spring MVC (Model View Controller) architecture for developing applications using various Java/J2EE technologies like Servlets, JSP, JDBC, JSTL.
- Responsible for coding SQL statements for back end communication using JDBC.
- Involved debugging and fixing production issues.
- Knowledge in UI development, UX design, web 2.0 specifications, visual design and team management.
- Created conditional logics in pages using JSF tags and JSTL.
Environment: Java/J2EE, JSP, Spring MVC, Spring Web Flow, JQuery, Angular JS JSF, EJB, JBoss, Maven, Hibernate 3.0, SOAP UI, HTML, XML, PL/SQL, JDBC, GIT, Rest, Oracle, AGILE Methodology, NetBeans, dependency injection.
ConfidentialJava Developer
Responsibilities:
- Worked in Spiral development environment.
- Design, implement and define components of the architecture strategy.
- Implement JUnit test cases for Struts/Spring components.
- Built SOAP Web services.
- Developed server side presentation layer using Struts MVC Framework.
- Developed ActionForm classes, Form beans, Action Classes using Struts.
- Participated in Server upgrades, code migrations, and also worked on important enhancements in the business requirements.
- Used the Node.js, Angular.js MVC Framework in the development of the web applications.
- Create and maintain server-side Node.js applications and APIs for our multi-platform front-end developers
- Performed Inheritance based OR mappings in tables to simplify the data in Hibernate.
- Used Hibernate for connecting to database with Oracle as its back- end.
- Interact with offshore team to coordinate and guide for the development happening in offshore
- Used Oracle database for SQL, PL/SQL scripts, stored procedures, functions, triggers, Oracle forms/Reports.
- Performance Tuning and Optimization with Java Performance Analysis Tool.
- Worked on Installing and configured WebSphere Application Server.
- Made use of CVS for checkout and check in operations.
Environment: Java, J2EE, Struts, WebSphere, Hibernate, JSP, Node.js, JavaScript, SOAP web services, Oracle 10g, CVS, Eclipse.
ConfidentialJava Developer
Responsibilities:
- Closely involved with design, development and implementation of the application.
- Developed Class Diagrams, Sequence Diagrams as part of Module Design Documentation.
- Developed web interface using Servlets, XSLT and JavaScript and Desktop application using Swings.
- Implemented the database connectivity to Oracle using JDBC.
- Worked with IntelliJ IDEA and DB Visualizer.
- Used Hibernate to access the database using entity POJO classes.
- Implemented object/relational persistence using Hibernate for the domain model. Developed hbm files, Entity classes using annotations and used HQL to query the Database.
- SOAP has been used as a protocol to send request and response in form of XML messages.
- Used XML parser for retrieving information from server side calls.
- Continuous Integration environments in SCRUM and Agile methodologies.
- Coded ANT scripts for compiling, building, packaging the jar files and deploying.
- Used Log4j logging framework in the application to store log messages.
- Involved in fixing the bugs at Development and production levels.
- Used IntelliJ IDE, and SVN for version control.
- Developed Junit test cases for unit testing and integration testing.
Environment: Java, J2EE, Swings, XSTL, JavaScript, Hibernate, SOAP, JDBC, SQL, IntelliJ, ANT, Oracle 10g, Junit, Agile Methodology, SVN, LINUX.