Bigdata Engineer Resume
Brentwood, TN
SUMMARY
- Around 6 years of commendable experience in teh IT industry with proven expertise in Big Data Analytics and Talend ETL for Data Integration
- Having 2 years of experience in Bigdata related technologies like Hadoop frameworks, Spark, Scala, HDFS, Map Reduce, Hive, Pig, Storm, Kafka, YARN, HBase, Accumolo, Oozie, Zookeeper, Flume, Sqoop.
- Having working experience on Cloudera Data Platform using VMware Player, Cent OS 6 Linux environment. Strong experience on Hadoop distributions like Cloudera and Horton Works.
- Worked on Cloud computing infrastructure such as Amazon web Services (AWS).
- Worked on HBase to load and retrieve data for real time processing using Rest API.
- Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache SPARK, Cloudera and AWS Service console.
- Worked with both Scala and Java, Created frameworks for processing data pipelines through Spark.
- Implemented batch - processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Worked with Big Data distributions like Cloudera (CDH 3 and 4) with Cloudera Manager.
- Basic Knowledge on Kudu, Nifi, Kylin and Zeppelin with Apache Spark.
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Managing and Reviewing data backups and log files.
- Excellent ability to use analytical tools to mine data, Predictive analysis, evaluating teh underlying patterns and implement complex algorithms for data analysis.
- Hands On experience on SPARK, Spark Streaming, Spark MLlib, SCALA. Creating teh Data Frames handle in SPARK with Scala.
- Hands On experience on developing UDF, DATA Frames and SQL Queries in SPARK SQL. Developed PIG Latin scripts and SPARKSQL scripts for handling data formation.
- Hands on experience on Real Time data tools like Kafka and Storm. Developed SQOOP Scripts for importing large dataset from RDBMS to HDFS.
- Experience in writing UDF's in Java for PIG and Hive. Efficient in writing teh Map Reduce programs for analyzing structured and unstructured data.
- Expertise in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing teh HiveQL queries.
- Experience in using Apache Sqoop to import and export data to and from HDFS and Hive. Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.
- Hands on experience in configuring and working with Flume to load teh data from multiple sources directly into HDFS.
- Experienced and skilled Agile Developer with a strong record of excellent teamwork.
TECHNICAL SKILLS
Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Avro, Hadoop Streaming, Storm, Spark, Kafka, YARN, Crunch, Zookeeper, HBase, Impala, Cassandra, MongoDb, Neo4J, Spark MLLIB
ETL Tools: Talend Platform for Data Integration/ ESB/ Big Data Integration
Distributions: Cloudera, Hortonworks and AWS
Languages: Java, C, Scala, SQL, PL/SQL, Shell Script
JEE Technologies: JSP, Servlets and JDBC
Framework: Hibernate, Spring, Struts, JMS, EJB, JUnit, MRUnit, JAXB
Web Services: RESTful, SOAP
Web Servers: Jboss, Tomcat, Web Logic, Web Sphere
IDE’s: Eclipse, Intellij
Build Tools: Maven, SBT and Gradle
CI Tools: Hudson/Jenkins
Version Control: SVN, GitHub, Bit bucket
Databases: Cassandra, Hbase, Oracle, Sql server and Teradata
Cloud Solutions: AWS EMR, Redshift, S3
Reporting Tools: Jasper Reports, iReport, Tableau, QlikView
PROFESSIONAL EXPERIENCE
Confidential, Brentwood, TN
Bigdata Engineer
Responsibilities:
- Designed and built a custom and generic ETL framework - Spark application using Scala for data loading and transformations.
- Worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Involved in teh data modelling of teh new system for Cassandra from teh existing legacy Oracle DB.
- Handled data transformations as per teh business and mapping rules.
- Executed complex data aggregations on teh calls and sales data for teh BI dashboards.
- Involved in teh configuration of spark jobs through amazon data pipeline for weekly, monthly and adhoc executions.
- Created custom logger to handle huge application log data.
- Created an error reprocessing framework dat handles flagged errors during teh subsequent loads.
- Used Zeppelin, beeline for querying Cassandra tables.
- Executed queries using sparkSQL for complex joins and data validation.
- Wrote scala udfs for handling complex transformation logics.
- Involved in teh design of partition and clustering keys as per teh data volume and query patterns on Cassandra tables
- Created modular and independent components for amazon aws S3 connections, data reads and data stores.
- Designed a custom referential integrity framework on teh No SQL Cassandra tables for maintaining data integrity and relations in teh data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Wrote scala scripts for extracts from Cassandra Operational Data Store tables for comparing with legacy data.
- Designed and implemented Spark cluster to Analyze Data in Cassandra.
- Worked with business users to understand teh pharma sales and calls process for teh selling solutions.
- Created teh data ingestion file validation component for checksum, last modified and threshold levels.
- Used Git for code base maintenance and JIRA for task tracking and monitoring.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and tan loading data into HDFS.
- Collecting and aggregating copious amounts of log data using Flume and staging data in HDFS for further analysis.
Environment: Hadoop, Spark Core, Spark-SQL, Spark-Streaming, Apache Mesos, MapReduce, HDFS, Hive, Java, Scala, Hue, SQL, Teradata, Pig, Sqoop, Tez, HBase, Accumolo, Cassandra, Zookeeper, PL/SQL, MySQL, DB2.
Confidential, Houston, TX
Hadoop Developer
Responsibilities:
- Involved in loading data from UNIX file system to HDFS.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon teh job requirement.
- Worked on Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and troubleshooting, Racks, Disk Topology, Manage and review data backups, Manage and review Hadoop log files.
- My responsibility involves in setting up teh Hadoop cluster for teh project and working on teh project using HQL and SQL.
- Involved in performing teh Linear Regression using Scala API and Spark.
- Used D3.js to visualize teh json data generated from hive quires related to Tumors.
- Involved in setting up teh Genomics Pipeline in Hadoop.
- Used Spark to perform Variant Calling Techniques In big data Genomics.
- Installed and configured MapReduce, HIVE and teh HDFS; implemented CDH5 Hadoop cluster on CentOS. Assisted with performance tuning, monitoring and troubleshooting.
- Created Map Reduce programs for some refined queries on big data.
- Involved in teh development of Pig UDF'S to analyze by pre-processing teh data
- Importing and exporting data into HDFS and Hive using Sqoop.
- Created reports for teh BI team using Sqoop to export data into HDFS and Hive.
- Managing and scheduling jobs on a Hadoop cluster using Oozie.
- Along with teh Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
- Involved in developing teh complex Map/reduce Jobs for data cleaning.
- Implemented Partitioning and bucketing in Hive.
- Mentored analyst and test team for writing Hive Queries.
- Involved in setting up of HBase to use HDFS.
- Extensively used Pig for data cleansing.
- Created RDD's in Spark technology and extracted data from data warehouse on to teh Spark RDD's
- Used Spark with Scala/Python.
- Worked on Talend ETL to load data from various sources to Oracle DB
- Extensively used components like tMap, tHL7Input, tHL7Output and tFile components to create Talend jobs.
- Worked in optimization and improving performance of teh Talend jobs.
- Created topics on teh Desktop portal using Spark Streaming with Kafka and Zookeeper.
- Involved in getting back teh lost data using DAG process.
- Used Datastax JAR's for this project.
- Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment: MapReduce, HDFS, Hive, Java (jdk1.7), Pig, Linux, XML. HBase, Zookeeper, Kafka, Sqoop, Flume, Oozie, Greenplum DB
Confidential, Atlanta,GA
Bigdata Developer
Responsibilities:
- Worked in Data Integration Project dat includes enhancements of several sub projects like BonusBuy, Store Finance Planning, Tax Exemption, ITC
- Participated in all phases of development life cycle with extensive involvement in teh definition and design meetings, functional and technical walkthroughs
- Developed jobs Using best practices for error logging and exception handling.
- As a IHG Standard, created a joblet using tStatCatcher, tLogCatcher, tAssertCatcher, tFlowMeterCatcer and called this joblet in all teh jobs to store processing stats into a Database table to record job history
- Consumed SAP IDOCS from IDOCS API using tRestClient and Inserted idocs into ODS tables
- Created promotion XML from teh ODS tables for ZBB6 - ZBB9 promotions and send them to WCS team.
- Worked on POSDM project to send different transactions data from Xcenter data base to EDW and DWHSE
- Posted teh transaction XMLs created from ODS tables to SAP for Store balancing and Auditing through poslogu API using tRestRequest and tRestResponse components.
- Worked on Bigdata jobs on Spark framework for loading data from SQL Server to HDFS
- Designed Talend Jobs Using Big Data components like tSqoopImport, tSqoopExport, tSqoopMerge, tHDFSInput, tHDFSOutput, tHDFSput, tHiveLoad, tHiveInput, tHivecreateTable.
- Created Hive tables and Analyzed data on top of HDFS data.
- Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading
- Used paramsAPI to fetch teh last runtime and update teh job end time for Incremental loading
- For teh better performance used Merge statements in tOracleRow to load teh data to persistent table Incrementally from teh Global temp tables
- Worked on context variables and defined contexts for database connections, file paths for easily migrating to different environments in a project
- Integrated java code inside Talend studio by using components like tJavaRow, tJava, tJavaFlex and Routines
- Created sub jobs in parallel to maximize teh performance and reduce overall job execution time with teh use of parallelize component of Talend in TIS and using teh Multithreaded Executions in TOS
- Added Code review synopsis, Developers self-review and documented Unit Test Plan(UTP) with all possible scenarios and attached to teh Jira for teh code review and approval for deployment to higher environments.
- Wrote stored procedures, Functions & Triggers to support teh ETL processes.
- Scheduled jobs in TAC and Tidal and linked job page, run book URL to teh Tidal entry for DI DEV Operations use in case of job failure
Environment : Talend 5.6/6.4, TAC, Tidal, Oracle, Netezza, Aginity, SVN, GIT, Bigdata, Spark, RabbitMQ, Postman, Confluence, Jira, Service Now, Putty, Linux, Java
Confidential
Junior Java Developer
Responsibilities:
- Worked on both WebLogic Portal 9.2 for Portal development and WebLogic 8.1 for Data Services Programming.
- Developed teh presentation layer using JSP, HTML, CSS and client validations using JavaScript .
- Used GWT to send AJAX request to teh server and updating data in teh UI dynamically.
- Developed Hibernate 3.0 in Data Access Layer to access and update information in teh database.
- Used JDBC, SQL and PL/SQL programming for storing, retrieving, manipulating teh data.
- Involved in designing and development of teh ecommerce site using JSP, Servlet, EJBs, JavaScript and JDBC .
- Used Eclipse 6.0 as IDE for application development Configured Struts framework to implement MVC design patterns.
- Validated all forms using Struts validation framework and implemented Tiles framework in teh presentation layer.
- Designed and developed GUI using JSP, HTML, DHTML and CSS . Worked with JMS for messaging interface.
- Used Hibernate for handling database transactions and persisting objects deployed teh entire project on WebLogic application server .
- Used AJAX for interactive user operations and client-side validations Used XSL transforms on certain XML data.
- Used XML for ORM mapping relations with teh java classes and teh database.
- Developed ANT script for compiling and deployment.
- Performed unit testing using Junit .
- Used Subversion as teh version control system. Extensively used Log4j for logging teh log files.
Environment : Java/J2EE, Oracle 10g, SQL, PL/SQL, JSP, EJB, Struts, Hibernate, WebLogic 8.0, HTML, AJAX, Java Script, JDBC, XML, JMS, XSLT, UML, JUnit, Log4j, Eclipse 6.0.
