Sr. Big Data Engineer Resume
Indianapolis, IN
SUMMARY:
- 7+ years of experience in a various IT related technologies, which includes hands - on experience in Big Data technologies.
- Proficient in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Flume, Yarn, HBase, Sqoop, Spark, Storm, Kafka, Oozie, and Zookeeper.
- Strong comprehension of Hadoop daemons and Map-Reduce topics.
- Used informatica Power Center for Extraction, Transformation, and Loading (ETL) of information from numerous sources like Flat files, XML documents, and Databases.
- Experienced in developing UDFs for Pig and Hive using Java.
- Strong knowledge of Spark for handling large data processing in streaming process along with Scala .
- Hands On experience on developing UDF , DATA Frames and SQL Queries in Spark SQL .
- Highly skilled in integrating kafka with Spark streaming for high speed data processing.
- Worked with NoSQL databases like HBase, Cassandra and MongoDB for information extraction and place huge amount of data.
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
- Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Ability to develop Map Reduce program using Java and Python.
- Good understanding and exposure to Python programming .
- Exporting and importing data to and from Oracle using SQL developer for analysis.
- Developed PL/SQL programs (Functions, Procedures, Packages and Triggers).
- Good experience in using Sqoop for traditional RDBMS data pulls.
- Worked with different distributions of hadoop like Hortonworks and Cloudera.
- Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors.
- Extensive experience in Shell scripting.
- Extensive use of Open Source Software and Web/Application Servers like Eclipse 3.x IDE and Apache Tomcat 6.0.
- Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
- Aptitude abilities in J2EE, J2SE, Servlets, Spring, Hibernate, JUnit, JSP, JDBC, Java Multithreading, Object Oriented Design Patterns, Exception Handling, Garbage Collection, HTML, Struts, Hibernate, Enterprise Java Beans, RMI, JNDI and XML-related innovations.
- Involved in reports development using reporting tools like Tableau. Used excel sheet, flat files, CSV files to generated Tableau adhoc reports.
- Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance tuning of mappings.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experience in cluster monitoring tools like Ambari & Apache hue .
- Solid Technical foundation, great investigative capacity, cooperative person, and objective arranged, with a promise toward incredibleness.
- Outstanding communication and presentation skills, willing to learn, adapt to new technologies and third-party products.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, Kafka, Storm and ZooKeeper.
Languages: C, Java, Python, Scala, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts
Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery
Frameworks: MVC, Struts, Spring, Hibernate
No SQL Databases: HBase, Cassandra, MongoDB
HP: UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL.
Web/Application servers: Apache Tomcat, WebLogic, JBoss.
Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata
Tools and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer
Version control: SVN, CVS, GIT
Web Services: REST, SOAP
PROFESSIONAL EXPERIENCE:
Confidential, Indianapolis, IN
Sr. Big Data Engineer
Environment: UNIX, Linux Java, Apache HDFS Map Reduce, Spark, Pig, Hive, HBase, Kafka, Sqoop, NOSQL, AWS (S3 buckets), EMR cluster, SOLR.
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop components.
- Solid Understanding of Hadoop HDFS, Map-Reduce and other Eco-System Projects.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Knowledge of architecture and functionality of NOSQL DB like HBase.
- Used S3 for data storage, responsible for handling huge amounts of data.
- Used EMR for data pre-analysis by creating EC2 instances.
- Used Kafka for obtaining the near real time data.
- Good experience in writing data ingesters likes Sqoop.
- Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API, Data Frames and Pair RDD's for faster processing of data and created RDD's, Data Frames and datasets.
- Batch-processing is done by using Spark implemented by Scala.
- Extensive data validation using HIVE and also written Hive UDFs.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
- Used Cloudera data platform for deploying Hadoop in some modules.
- Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way lots of scripting (python and shell) to provision and spin up virtualized Hadoop clusters.
- Developed and Configured Kafka brokers to pipeline server logs data into Spark streaming.
- Created external tables pointing to HBase to access table with huge number of columns.
- Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
- Configured TALEND ETL tool for some data filtering,
- Processed the data in HBase using Apache Crunch pipelines, a map-reduce programming model which is efficient for processing AVRO data formats.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
Confidential, CA
Big Data Engineer
Environment: Hadoop, Cloudera, Talend, Scala, Spark, HDFS, Hive, Pig, Sqoop, DB2, SQL, Linux, Yarn, NDM, Quality Center 9.2, Informatica, Windows & Microsoft Office.
Responsibilities:
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Worked on batch processing of data sources using Apache Spark, Elastic search.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs , Scala .
- Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance.
- Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
- Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
- Loading data from different source(database & files) into Hive using Talend tool.
- Conducted POC’ s for ingesting data using Flume.
- Used all major ETL transformations to load the tables through Informatica mappings.
- Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
- Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement.
- Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
- Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
- Conduct/Participate in project team meetings to gather status, discuss issues & action items
- Provide support for research and resolution of testing issues.
- Coordinating with Business for UAT sign off.
Confidential, CA
Hadoop Developer
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Oozie, Nagios, Ganglia, LINUX, Hue
Responsibilities:
- Worked on Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked on debugging, performance tuning of Hive & Pig Jobs
- Created HBase tables to store various data formats of PII data coming from different portfolios
- Implemented test scripts to support test driven development and continuous integration
- Worked on tuning the performance Pig queries
- Involved in loading data from LINUX file system to HDFS
- Importing and exporting data into HDFS and Hive using Sqoop
- Experience working on processing unstructured data using Pig and Hive
- Supported MapReduce Programs those are running on the cluster
- Gained experience in managing and reviewing Hadoop log files
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Assisted in monitoring Hadoop cluster using tools like Nagios, and Ganglia
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
Confidential, NJ
Data Analyst
Environment: Erwin, Tableau, R, MS Excel, SQL, MS-SQL Databases.
Responsibilities:
- Worked as a Data Analyst to generate data models using Erwin and developed relational database systems.
- Involved with data analysis primarily identifying the datasets, source data, meta data, data formats and data definition.
- Installed and worked with R and Tableau in creating visualizations for the data.
- Documented the complete process flow to describe program development, logic, testing, implementation and application integration.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Devised procedures that solve complex business problems with due considerations for hardware/software capacity and limitations, operating times and desired results.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Involved in the implementation of metadata repository, maintaining data quality, data cleaning procedures, data transformations, stored procedures, triggers and execution plans.
- Responsible for data extraction, data aggregation, building of centralized data solutions and quantitative analysis to generate business insights.
- Created and designed reports that use gathered metrics to infer and draw logical conclusions of past and future behavior.
- Worked hands on with ETL process.
- Worked closely with ETL, SSIS, SSRS developers to explain the data transformations using logic.
- Prepared the workspace for markdown, accomplished data analysis, statistical analysis, generated reports, listings, and graphs.
Confidential, Mclean, VA
Big Data Developer
Environment: Map Reduce, HDFS, Sqoop, Pig, Hive, Kafka, Flume, Shell Scripts, Oozie, ETL, Agile, SQL
Responsibilities:
- Developed in writing MapReduce jobs.
- Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
- Developed scripts in Pig for transforming data and extensively used event joins, filtered and did pre- aggregations.
- Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
- Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
- Worked on HiveQL.
- Implemented Kafka for streaming data and filtered, processed the data.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
- Developed Shell scripts for scheduling and automating the job flow.
- Developed a workflow using Oozie to automate the tasks of loading the data into HDFS.
- Developed MapReduce jobs to calculate the total usage of data by commercial routers in different locations, developed Map reduce programs for data sorting in HDFS
- Load balancing of ETL processes, database performance tuning ETL processing tools.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
- Optimized Hive queries to extract the customer information from HDFS.
- Performed Data scrubbing and processing with Oozie and for workflow automation and coordination.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.