We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Bothell, WA

SUMMARY:

  • Over 7+ years of professional IT experience which includes 4 plus years of experience in Big data ecosystem related technologies like Spark, Hadoop, Pig, Hive, Sqoop, HBase, Cassandra and designing and implementing Map/Reduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
  • Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
  • Good knowledge with fast and general engine for large - scale data processing in Spark with Spark core, Spark-SQL and Spark-streaming.
  • Knowledge with Kafka Cluster Developed a Spark Streaming Kafka supplication cluster to Process Hadoop Jobs Logs.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Big Data eco system (Job Tracker, Task Tracker, Name Node, Data Node) and Map Reduce programming paradigm.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper, Yarn and Lucene.
  • Analyzing code issues with logs from splunk with all the application and web server
  • Experienced in monitoring Hadoop cluster environment using Ganglia.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Involved in generating automated scripts (YAML files) of Falcon and Oozie using ruby.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Experienced in SQA (Software Quality Assurance) including Manual and Automated testing with tools such as Selenium RC/IDE/WebDriver/Grid and Junit, Load Runner, Jprofile, RFT (Rational Functional Tester).
  • Proficient in deploying applications o n J2EE Application servers like Web-Sphere, Web-logic, Glassfish, Tuxed o, JBoss and Apache Tomcat web server.
  • Expertise in developing applications using J2EE Architectures / frameworks like Struts, Spring Framework and SDP (Qwest Communications) Framework.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
  • Experience in NoSQL data stores (Hbase, Cassandra).
  • Implemented POC’s using Amazon Cloud Components (S3, EC2, Elastic beanstalk and SimpleDB).
  • Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle 8i/9i/10g. .
  • Performance Tuning Hadoop run-time parameters to minimize the map disk spill, mapper tasks and mapper output.
  • Migration of data using sqoop databases transferring data between traditional and Hadoop .
  • Using Flume for migration data from any source into Hadoop(HDFS).

TECHNICAL SKILLS:

JAVA Technologies: Java, JDK 1.2, JDK 1.3, JDK1.4, JDK1.5, JDK1.6.

J2EE Technologies: JSP, Java Bean, Servlets, JDBC, JPA1.0, EJB3.0, JDBC, JNDI, JOLT, Amazon Cloud (S3, EC2 ).

Languages: C, C++, PL/SQL, and Java.

Frame Works: Hadoop (HDFS, Map Reduce, Pig, Hive, HBase, Mahout, Oozie, Zookeeper, YARN, Lucene,Spark) Struts 1.x and Spring 3.x

Web Technologies: XHTML, JavaScript, AngularJS, AJAX, HTML, XML, XSLT, XPATH, CSS, DOM, WSDL, GWT, JQuery, Perl, VB Script.

Application Servers: WebLogic8.1/9.1/10.x, Web-Sphere5.x/6.x/7.x, Tuxedo server 7.x/9.x, Glass Fish Server 2.x, JBoss4.x/5.x.

Web Servers: Apache Tomcat 4.0/ 5.5, Java Web Server 2.0.

Operating Systems: Windows: XP/2000/NT, UNIX, Linux, and DOS

Database: SQL, Oracle 9i/10g, SQLServer, Hbase and Cassandra.

IDE: Eclipse3.x, My Eclipse 8.x, RAD 7.x and JDeveloper 10.x.

Tools: Adobe, Sql Developer, Flume and Sqoop

Web Technologies: XHTML, JavaScript, XML, CSS, DOM, WSDL, SOA, Web Services.

Platforms: Windows XP/NT/9x/2000, MS DOS, UNIX /LINUX/Solaris/AIX

Databases: SQL, PL/SQL, Oracle 9i/10g, MYSQL, Microsoft Access, SQLServer.

PROFESSIONAL EXPERIENCE:

Confidential, Bothell, WA

Big Data Engineer

Responsibilities:

  • Implemented Spark using python and SparkSQL for faster testing and processing of data.
  • Implemented data ingestion and handling clusters in real time processing using kafka .
  • Experience with Core Distributed computing and Data Mining Library using Apache Spark.
  • Worked on building BI reports in Tableau with Spark using Shark and SparkSQL .
  • Participated in Gathering requirements, analyze requirements and design technical documents for business requirements.
  • Involved different phases in big data projects like data acquiring, data processing and data serving using dash boards .
  • Import/export data from Oracle data base to/from HDFS using Sqoop and JDBC .
  • Gathered data from different sources like Internet, sensors, user behavior, and moved to HDFS using Optimized join base in MapReduce programs .
  • Implemented Custom Input formats that handles input files received from java applications to process in MapReduce .
  • Developed Hadoop Map Reduce jobs for unit testing using MRUnit.
  • Divided each data set in to corresponding categories by fallowing MapReduce Binning design pattern .
  • Implemented Filter Mappers to eliminate un-necessary records.
  • Experience in using Pig as an ETL tool for event joins, filters, transformations and pre- aggregations .
  • Created partitions, bucketing across state in Hive to handle structured data.
  • Implemented Dash boards that handle HiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
  • Implemented business logic based on state in Hive using Generic UDF's . Used HBase-Hive integration .
  • Involved in creating data-models for customer data using Cassandra Query Language.
  • Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie .
  • Created production jobs using Oozie work flows that integrated different actions like MapReduce, Sqoop, and Hive .
  • Experience in managing and reviewing Hadoop Log files.
  • Experienced with monitoring Cluster using Cloudera manager.
  • Experienced in configuring maven builds that integrated dependencies check styles, test coverage's.
  • Involved in daily SCRUM meetings to discuss the development/progress of Sprints and was active in making scrum meetings more productive.

Environment: Big Data, Hadoop, MapReduce, Pig, Hive, Sqoop, Oozie, python, Spark, Strom, Kafka, Cassandra, Linux, Oracle11g, Cloudera manager, Maven, MRUnit, Junit

Confidential, Bellevue, Wa

Hadoop developer

Responsibilities:

  • Involved in ingesting data into IDW staging directly from BEAM, (an inbuilt component for ingesting real time data into hadoop). Using Apache Storm to push same data into HDFS, from where the hive table can be defined and tied to the loaded data in HDFS.
  • Involved in creating and executing BTEQ scripts to load data from Hadoop Staging area to Teradata .
  • Tested the BTEQ scripts before deploying to production Cluster.
  • Writing Oozie workflows to run multiple Hive, shell script and Pig jobs which run independently with time and data availability.
  • Proposed an automated system using Shell script for the hadoop jobs deployment process.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Involved in writing Hive, Pig scripts for complex transformations.
  • Implemented Hive custom UDF’s to achieve comprehensive data analysis.
  • Worked in Agile development approach.
  • Prepared pig scripts and spark sql to handle all the transformations specified in the S2TM’s and to handle SCD2 and SCD1 scenarios.
  • Created tables in Teradata to export the data from HDFS using Sqoop after all the transformations and wrote Bteq scripts to handle updates and inserts of the records.
  • Created Teradata views on top of the table as per the business requirement.

Environment: Hortonworks Data Platform, Hadoop Platform, HDFS, Hbase, Hive, Java, Sqoop, Oracle, MySQL.

Confidential, Hoffman Estates, IL

Hadoop Developer

Responsibilities:

  • Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.
  • Developed data pipeline using Pig and Java map reduce to ingest customer profiles and purchase histories into HDFS for analysis.
  • Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
  • Used Pig as ETL tool to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS
  • Analyzed the data using Pig and written Pig scripts by grouping, joining and sorting the data.
  • The data is collected from distributed sources into data models. Applied transformations and standardizations and loaded into Hbase for further data processing.
  • Applied pattern matching algorithms to match customers spending habits with loyalty points using Hive and stored the output in Hbase.
  • Involved in Performance Tuning Hadoop run-time parameters to minimize the map disk spill, mapper tasks and mapper output
  • Use configuration file and command line arguments to set parameters with balancing reducer’s loading.

Environment: Cloudera Data Platform, JDK1.6, RHEL, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume

Confidential

Java Developer

Responsibilities:

  • Used Hibernate ORM tool as persistence Layer - using the database and configuration data to provide persistence services (and persistent objects) to the application.
  • Responsible for developing DAO layer using Spring MVC and configuration XML’s for Hibernate and to also manage CRUD operations (insert, update, and delete).
  • Implemented Dependency injection of spring framework.
  • Develo0ped reusable services using BPEL to transfer data.
  • Created JUnit test cases, and Development of JUnit classes.
  • Configured log4j to enable/disable logging in application.
  • Developed Rich user interface using HTML, JSP, AJAX, JSTL, Java Script, JQuery and CSS.
  • Implemented PL/SQL queries, Procedures to perform data base operations.
  • Wrote UNIX Shell scripts and used UNIX environment to deploy the EAR and read the logs.
  • Implemented Log4j for logging purpose in the application.

Environment: Java, Jest, SOA Suite 10g (BPEL), Struts, Spring, Hibernate, Web services (JAX-WS), JMS, EJB, Web logic 10.1 Server, JDeveloper, Sql Developer, HTML, LDAP, Maven, XML, CSS, JavaScript, JSON, SQL, PL/SQL, Oracle, JUnit, CVS and UNIX/Linux.

Confidential

SQL Developer

Responsibilities:

  • Created new database objects like Procedures, Functions, Packages, Triggers, Indexes and Views Using Confidential -SQL in Development and Production environment for SQL Server.Developed Database Triggers to enforce Data integrity and additional Referential Integrity.
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and formatted the results into reports and kept logs.
  • Involved in performance tuning and monitoring of both Confidential -SQL and PL/SQL blocks.
  • Wrote Confidential -SQL procedures to generate DML scripts that modified database objects dynamically based on user inputs.

Environment: SQL Server 7.0, Oracle 8i, Windows NT, C++, HTML, Confidential -SQL, PL/SQL, SQL Loader.

We'd love your feedback!