Big Data Engineer Resume
Golden Valley, MN
SUMMARY
- 8+ years of IT experience with 5+ years of experience with Big Data and Hadoop related components like HDFS, Map Reduce, Pig, Hive, YARN, Sqoop, Flume, Crunch, Spark, Strom, Scala, Kafka.
- Experience in multiple Hadoop distributions like Cloudera, MapR, and Hortonworks.
- Excellent understanding of NoSQL databases like HBase, Cassandra and MongoDB.
- Experience on working structured, unstructured data with various file formats such as XML files, JSON files, and sequence files using Map - Reduce programs.
- Work experience with cloud providers like Amazon web services (AWS), Azure, OpenStack.
- Have good grasp of data warehousing fundamentals and have proven ability to implement them. Conversant with ETL processes.
- Implemented custom business logic and performed join optimization, secondary sorting, custom sorting using Map Reduce programs.
- Expertise in Data ingestion using Sqoop, Apache Kafka, Spark Streaming and Flume (streaming data like web server logs).
- Implemented business logic using Pig scripts. Wrote custom Pig UDFs to analyze data. Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data.
- Experience in performing data validation using HIVE dynamic partitioning and bucketing.
- Extensive experience with wiring SQL queries using HiveQL to perform analytics on structured data.
- Experience with Oozie Workflow Engine to automate and parallelize Hadoop, Map Reduce and Pig.
- Worked with SQL, SQL PLUS, Oracle PL/SQL, Stored Procedures, Table Partitions, Triggers, SQL queries, PL/SQL Packages, and loading data into Data Warehouse/Data Marts.
- Experience in working with Windows, UNIX/LINUX platform with different technologies such as Big Data, SQL, XML, HTML, Core Java and Shell Scripting.
- Experienced in importing and exporting data between RDBMS and Teradata into HDFS using Sqoop.
- Worked on MongoDB, Cassandra database and related web services for storing data.
- Good knowledge analyzing data using Python development and scripting for Hadoop Streaming.
- Experience in implementing algorithms for analysis using spark. Experience in implementing Spark using Scala and Spark SQL for faster processing of data.
- Working knowledge of Visualizations tools like Tableau.
- Experience in getting data from various sources into HDFS and building reports using Tableau. Used Tableau with data in HDFS; Experience in creating tables on top of data on AWSS3 obtained from different data sources and providing them to analytics team building reports using Tableau.
- Extensive Hands on experience with Accessing and perform CRUD operations against HBase data using Java API and implementing time series data management.
- Expertise with Application servers and web servers like WebLogic, IBM WebSphere, Apache Tomcat, JBOSS and VMware.
- Experienced in developing the unit test cases using Junit.
- Experience in using Maven for build automation.
- Experience in using version control and configuration management tools like SVN, CVS.
- Experience working in environments using Agile (SCRUM) and Waterfall methodologies.
- Expertise in database modeling, administration and development usingSQL and PL/SQL in Oracle, MySQL, DB2 and SQL Server.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, Map Reduce, Hive, Pig, YARN, Sqoop, Flume, Oozie, Scala, Kafka, Spark, AWS, Hadoop
Methodologies: Agile, Waterfall
Language: Scala, Python, Java
Application/Web Servers: Apache-Tomcat, JBoss, IBM WebSphere and WebLogic.
Web Technologies: Angular.JS, Node.js EXPRESS, jQuery UI, Ajax, HTML/HTML5, CSS/CSS3, RESTful Service, JavaScript, jQuery, Bootstrap, JSON
XML Technologies: XML,DOM
Database: Oracle 10g/11g, PL/SQL, MongoDB, MySQL, MS SQL Server 2012, HBase.
Build Tool: Ant, Maven
Web Services: RESTful, SOAP
Testing: Junit
IDE Tools: Eclipse, NetBeans.
Version Control: SVN, CVS, Git
Operating Systems: Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS X
Other Tools: Jenkins, AWS, Azure, OpenStack
PROFESSIONAL EXPERIENCE
Confidential, Golden Valley, MN
Big Data Engineer
Responsibilities:
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala with Cloudera distribution.
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Developed workflows and coordinator jobs in Oozie.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Developed a data pipeline using Kafka and Strom to store data into HDFS.
- Experience in deploying data from various sources into HDFS and building reports using Tableau.
- Performed real time analysis on the incoming data.
- Configured, deployed and maintained multi-node Dev and Test Kafka Clusters.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Loading data into HBase using Bulk Load and Non-bulk load.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, HDFS, MapReduce, Spark, Pig, Hive, Sqoop, Flume, Kafka, HBase, Oozie, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera.
Confidential, O’Fallon, MO
Big Data Engineer
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Experienced in installing, configuring and using Hadoop Ecosystem components.
- Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
- Knowledge in performance troubleshooting and tuning Hadoop clusters.
- Participated in development/implementation of Cloudera Hadoop environment.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
- Implemented Spark advanced procedures like text analytics and processing using the in-memory computing capabilities.
- Involved in various NoSQL databases like HBase, Cassandra in implementing and integration.
- Developed various transformations in Informatica and supported Data warehouse design and development extensively.
- Installed and configured Hive and also written Hive UDFs and Used MapReduce and Junit for unit testing.
- Writing entities in Scala to interact with database.
- Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to Hive and Impala.
- Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
- Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
- Installed and configured Hive and also written Hive UDFs and used piggy bank a repository of UDF's for Pig Latin.
- Exploring with theSparkfor improving the performance and optimization of the existing algorithms in Hadoop usingSparkContext,Spark-SQL, Data Frame, Pair RDD's,SparkYARN.
- Experienced in managing and reviewing Hadoop log files.
- Worked in installing cluster, commissioning decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Supported Map Reduce Programs those are running on the cluster. Involved in loading data from UNIX file system to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Cloudera Hadoop, HDFS, Hive, Flume, HBase, Sqoop, PIG, Java JDK 1.6, Eclipse, MySQL, JSON, Apache Kafka, Spark, Ubuntu, Zookeeper.
Confidential, Richmond, VA
Hadoop Developer
Responsibilities:
- Responsible in gathering requirements from users and designing Use cases, Technical Design and Implementation.
- Extensively worked on Spring and Hibernate Frameworks.
- Installed and configured Hadoop MapReduce HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Experience in installing configuring and using Hadoop ecosystem components.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Developed the entire front end screens using AJAX, JSP, JSP Tag Libraries, CSS, HTML and JavaScript.
- Used JavaScript and jQuery for front end validations and functionalities.
- Participated in development/implementation of HortonWorks Hadoop environment.
- Created the Node.js EXPRESS Server combined with Socket.io to build MVC framework from front-end side AngularJS to back-end MongoDB, in order to provide broadcast service as well as chatting service.
- Contributed significantly in applying the MVC Design pattern using Spring.
- Implemented action Form classes for data transfer and server side data validation.
- Performed Unit Testing JUnit, System Testing and Integration Testing.
- Involved in Maintenance and Bug Fixing.
- Used Eclipse as an IDE for developing application.
- Involved in the complete software development life cycle.
- Involved in unit testing and user documentation and used Log4j for creating the logs.
Environment: Hadoop, HDFS, Hive, Flume, HBase, Sqoop, PIG, Java JDK 1.6, Eclipse, MySQL, JSON, Java Script, jQuery, LOG4j.
Confidential, NY
Hadoop Developer
Responsibilities:
- Exported data from DB2 to HDFS using Sqoop.
- Developed MapReduce jobs using Java API.
- Installed and configured Pig and also wrote Pig Latin scripts.
- Wrote MapReduce jobs using Pig Latin.
- Developed workflow using Oozie for running MapReduce jobs and Hive Queries.
- Worked on Cluster coordination services through Zookeeper.
- Worked on loading log data directly into HDFS using Flume.
- Involved in loading data from LINUX file system to HDFS.
- Responsible for managing data from multiple sources.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Implemented JMS for asynchronous auditing purposes.
- Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
- Experience in defining, designing and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
- Experience in Develop monitoring and performance metrics for Hadoop clusters.
- Experience in Document designs and procedures for building and managing Hadoop clusters.
- Strong Experience in troubleshooting the operating system, maintaining the cluster issues and also java related bugs.
- Experienced import/export data into HDFS/Hive from relational database and Teradata using Sqoop.
- Successfully loaded files to Hive and HDFS from Mongo DB Solar.
- Experience in Automate deployment, management and self-serve troubleshooting applications.
- Define and evolve existing architecture to scale with growth data volume, users and usage.
- Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
- Installed and configured Hive and also written Hive UDFs.
- Experience in managing the CVS and migrating into Subversion.
- Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.
Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, MySQL and Ubuntu, Zookeeper, Java (JDK 1.6).
Confidential
Java Developer
Responsibilities:
- Developed UI using HTML, CSS, Java Script and AJAX.
- Used Oracle IDE to create web services for EI application using top down approach.
- Worked on creating basic framework for spring and web services enabled environment for EI applications as web service provider.
- Created SOAP Handler to enable authentication and audit logging during Web Service calls.
- Created Service Layer API's and Domain objects using Struts.
- Wrote PL/SQL queries and created stored procedures and invoke stored procedures using spring JDBC.
- Used Exception handling and Multi-threading for the optimum performance of the application.
- Used the Core Java concepts to implement the Business Logic.
- Created High level Design Document for Web Services and EI common framework and participated in review discussion meeting with client.
- Deployed and configured the data source for database in WebLogic application server and utilized log4j for tracking errors and debugging, maintain the source code using Subversion.
- Used Clear Case tool for build management and ANT for Application configuration and Integration.
- Created, executed, and documented, the tests necessary to ensure that an application and/or environment meet performance requirements (Technical, Functional and User Interface)
Environment: Windows, Linux, Rational Clear Case, Java, JAX-WS, SOAP, WSDL, JSP, Java Script, Ajax, Oracle IDE, log4j, ANT, struts, JPA, XML, HTML5, CSS3, Oracle WebLogic.
Confidential
Java Developer
Responsibilities:
- Developed the spring AOP programming to configure logging for the application
- Expertise in developing enterprise applications using Struts Frameworks
- Developed the front end using JSF and Port let.
- Developed Scalable applications using Stateless session EJBs.
- Developed the UI panels using JSF, XHTML, CSS, DOJO and JQuery
- MySQL to access data in the database at different Levels.
- Making a connection to backend MySQL database.
- Design and Developed using Web Service using Apache Axis wrote numerous session and message driven beans for operation on JBoss and Web Logic
- Worked with SDLC process like water fall model, AGILE methodology
- JSP interfaces were developed. Custom tags were used
- Developed Servlets and Worked extensively on Sql.
- Used ANT for building the application and deployed on BEA Web Logic Application Server.
- Was responsible for Developing XML Parsing logic using SAX/DOM Parsers
- Good network at EMC Documentum Support Teams who help solve product issues and bugs
- Worked on tickets from service-now and Jira on daily basis.
- Designed the front end using Swing.
- Apache Tomcat Server was used to deploy the application.
- Design online stores using ASP & JavaScript: develop custom storefront applications, and custom user-interfaces for client sites.
- J2EE to communicate legacy COBOL based mainframe implementations.
- Worked on PL/SQL and SQL queries
- Developed Java Script and Action Script, VB Script macros for Client Side validations.
Environment: Spring, Struts, JSF, EJBs, JQuery, MySQL, DB2, Net Beans, JBoss, CVS, VSS, water fall model, UML,JSP, Servlets, ANT, XML, EMC, Jira, IBM MQ, Tomcat Server, Linux, Unix server
