Big Data Engineer Resume
Bothell, WA
SUMMARY
- Over 7+ years of professional IT experience which includes 3 plus years of experience in Big data ecosystem related technologies like Spark, Hadoop, Pig, Hive, Sqoop, HBase, Cassandra and designing and implementing Map/Reduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
- Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Good knowledge with fast and general engine for large - scale data processing in Spark with Spark core, Spark-SQL and Spark-streaming.
- Knowledge with Kafka Cluster Developed a Spark Streaming Kafka supplication cluster to Process Hadoop Jobs Logs.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Big Data eco system (Job Tracker, Task Tracker, Name Node, Data Node) and Map Reduce programming paradigm.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper, Yarn and Lucene.
- Analyzing code issues with logs from splunk with all the application and web server
- Experienced in monitoring Hadoop cluster environment using Ganglia.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Involved in generating automated scripts (YAML files) of Falcon and Oozie using ruby.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Experienced in SQA (Software Quality Assurance) including Manual and Automated testing with tools such as Selenium RC/IDE/WebDriver/Grid and Junit, Load Runner, Jprofile, RFT (Rational Functional Tester).
- Proficient in deploying applications on J2EE Application servers like Web-Sphere, Web-logic, Glassfish, Tuxedo, JBoss and Apache Tomcat web server.
- Expertise in developing applications using J2EE Architectures / frameworks like Struts, Spring Framework and SDP (Qwest Communications) Framework.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
- Experience in NoSQL data stores (Hbase, Cassandra).
- Implemented POC’s using Amazon Cloud Components (S3, EC2, Elastic beanstalk and SimpleDB).
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle 8i/9i/10g..
- Extensive experience using multiple languages/technologies (Java/J2EE, C++, C, Perl, PHP, Ruby) and environments (JVM, Linux, Various Unix, Windows).
- Ability to adapt to evolving technology, strong sense of responsibility and Accomplishment.
- Ability to meet deadlines and handle multiple tasks, flexible in work schedules and possess good communication skills.
- Performance Tuning Hadoop run-time parameters to minimize the map disk spill, mapper tasks and mapper output.
- Migration of data using sqoop databases transferring data between traditional and Hadoop.
- Using Flume for migration data from any source into Hadoop(HDFS).
TECHNICAL SKILLS
JAVA Technologies: Java, JDK 1.2, JDK 1.3, JDK1.4, JDK1.5, JDK1.6.
J2EE Technologies: JSP, Java Bean, Servlets, JDBC, JPA1.0, EJB3.0, JDBC, JNDI, JOLT, Amazon Cloud (S3, EC2 ).
Languages: C, C++, PL/SQL, and Java.
Frame Works: Hadoop (HDFS, Map Reduce, Pig, Hive, HBase, Mahout, Oozie, Zookeeper, YARN, Lucene,Spark) Struts 1.x and Spring 3.x
Web Technologies: XHTML, JavaScript, AngularJS, AJAX, HTML, XML, XSLT, XPATH, CSS, DOM, WSDL, GWT, JQuery, Perl, VB Script.
Application Servers: WebLogic8.1/9.1/10.x, Web-Sphere5.x/6.x/7.x, Tuxedo server 7.x/9.x, Glass Fish Server 2.x, JBoss4.x/5.x.
Web Servers: Apache Tomcat 4.0/ 5.5, Java Web Server 2.0.
Operating Systems: Windows-XP/2000/NT, UNIX, Linux, and DOS
Database: SQL, Oracle 9i/10g, SQLServer, Hbase and Cassandra.
IDE: Eclipse3.x, My Eclipse 8.x, RAD 7.x and JDeveloper 10.x.
Tools: Adobe, Sql Developer, Flume and Sqoop
Web Technologies: XHTML, JavaScript, XML, CSS, DOM, WSDL, SOA, Web Services.
Platforms: Windows XP/NT/9x/2000, MS-DOS, UNIX /LINUX/Solaris/AIX
Databases: SQL, PL/SQL, Oracle 9i/10g, MYSQL, Microsoft Access, SQLServer.
Version Control: Win CVS, VSS, PVCS, Subversion, Git
PROFESSIONAL EXPERIENCE
Confidential, Bothell WA
Big Data Engineer
Responsibilities:
- Implemented Spark using python and SparkSQL for faster testing and processing of data.
- Implemented data ingestion and handling clusters in real time processing using kafka.
- Experience with Core Distributed computing and Data Mining Library using Apache Spark.
- Worked on building BI reports in Tableau with Spark using Shark and SparkSQL.
- Participated in Gathering requirements, analyze requirements and design technical documents for business requirements.
- Involved different phases in big data projects like data acquiring, data processing and data serving using dash boards.
- Import/export data from Oracle data base to/from HDFS using Sqoop and JDBC.
- Gathered data from different sources like Internet, sensors, user behavior, and moved to HDFS using Optimized join base in MapReduce programs.
- Implemented Custom Input formats that handles input files received from java applications to process in MapReduce.
- Developed HadoopMap Reduce jobs for unit testing using MRUnit.
- Divided each data set in to corresponding categories by fallowing MapReduce Binning design pattern.
- Implemented Filter Mappers to eliminate un-necessary records.
- Experience in using Pig as an ETL tool for event joins, filters, transformations and pre- aggregations.
- Created partitions, bucketing across state in Hive to handle structured data.
- Implemented Dash boards that handle HiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
- Implemented business logic based on state in Hive using Generic UDF's. Used HBase-Hive integration.
- Involved in creating data-models for customer data using Cassandra Query Language.
- Managing and scheduling batch Jobs on a HadoopCluster using Oozie.
- Created production jobs using Oozie work flows that integrated different actions like MapReduce, Sqoop, and Hive.
- Experience in managing and reviewing HadoopLog files.
- Experienced with monitoring Cluster using Cloudera manager.
- Experienced in configuring maven builds that integrated dependencies check styles, test coverage's.
- Involved in daily SCRUM meetings to discuss the development/progress of Sprints and was active in making scrum meetings more productive.
Environment: Big Data, Hadoop, MapReduce, Pig, Hive, Sqoop, Oozie, python, Spark, Strom, Kafka, Cassandra, Linux, Oracle11g, Cloudera manager, Maven, MRUnit, Junit
Confidential, Hoffman Estates, IL
Hadoop Developer
Responsibilities:
- Involved in design and development phases ofSoftware Development Life Cycle (SDLC)usingScrummethodology.
- Developed data pipeline using Pig and Java map reduce to ingest customer profiles and purchase histories into HDFS for analysis.
- Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
- Used Pig as ETL tool to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS
- Analyzed the data using Pig and written Pig scripts by grouping, joining and sorting the data.
- The data is collected from distributed sources into data models. Applied transformations and standardizations and loaded into Hbase for further data processing.
- Applied pattern matching algorithms to match customers spending habits with loyalty points using Hive and stored the output in Hbase.
- Involved in Performance Tuning Hadoop run-time parameters to minimize the map disk spill, mapper tasks and mapper output
- Use configuration file and command line arguments to set parameters with balancing reducer’s loading.
Environment: JDK1.6,RHEL, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume
Confidential
Java Developer
Responsibilities:
- Used Hibernate ORM tool as persistence Layer - using the database and configuration data to provide persistence services (and persistent objects) to the application.
- Responsible for developing DAO layer using Spring MVC and configuration XML’s for Hibernate and to also manage CRUD operations (insert, update, and delete).
- Implemented Dependency injection of spring framework.
- Develo0ped reusable services using BPEL to transfer data.
- Created JUnit test cases, and Development of JUnit classes.
- Configured log4j to enable/disable logging in application.
- Developed Rich user interface using HTML, JSP, AJAX, JSTL, Java Script, JQuery and CSS.
- Implemented PL/SQL queries, Procedures to perform data base operations.
- Wrote UNIX Shell scripts and used UNIX environment to deploy the EAR and read the logs.
- Implemented Log4j for logging purpose in the application.
Environment: Java, Jest, SOA Suite 10g (BPEL), Struts, Spring, Hibernate, Web services (JAX-WS), JMS, EJB, Web logic 10.1 Server, JDeveloper, Sql Developer, HTML, LDAP, Maven, XML, CSS, JavaScript, JSON, SQL, PL/SQL, Oracle, JUnit, CVS and UNIX/Linux.
Confidential
SQL Developer
Responsibilities:
- Created new database objects like Procedures, Functions, Packages, Triggers, Indexes and Views Using T-SQL in Development and Production environment for SQL Server.
- Developed Database Triggers to enforce Data integrity and additional Referential Integrity.
- Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and formatted the results into reports and kept logs.
- Involved in performance tuning and monitoring of both T-SQL and PL/SQL blocks.
- Wrote T-SQL procedures to generate DML scripts that modified database objects dynamically based on user inputs.
Environment: SQL Server 7.0, Oracle 8i, Windows NT, C++, HTML, T-SQL, PL/SQL, SQL Loader.