We provide IT Staff Augmentation Services!

Big Data Developer Resume

3.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY:

  • Over 5+years of professional IT experience in analysis, design and development using Hadoop, Java J2EE and SQL.
  • Strong experience in dealing with Apache Hadoop components such as HDFS, Map reduce, Hive, HBase, PIG, Spark, Spark Streaming, Impala, Oozie, Flume, HCatalog, Kafka and Sqoop.
  • Strong Experience in processing large sets of structured and semi - structured data and supporting systems application architecture.
  • Good Experience in assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
  • Experience optimizing ETL workflows.
  • Experience with new Hadoop 2.0 architecture YARN(MRV2) and developing YARN Applications on it.
  • Experience in writing UNIX shell scripting.
  • Worked on analyzing the data using HiveQL.
  • Worked on writing custom UDF'S in extending Hive functionality.
  • Worked on managing and reviewing Hadoop log files.
  • Worked on Sqoop, in moving the data from a relational database into Hadoop and used FLUME in collecting the data and populate Hadoop.
  • Worked on HBase in conducting the quick look ups such as updates, inserts and deletes in Hadoop.
  • Very good Knowledge in Spark architecture with the python scripts
  • Experience with Cloud era, Horton works and MapR distributions.
  • Worked on the Cloudera Hadoop and Spark developer environment with on-demand lab work using a virtual machine on the cloud.
  • Experience with Apache Spark’s Core, Spark SQL, Streaming and MlLib components.
  • Work experience with cloud infrastructure like Amazon Web Services (AWS).
  • Experience in Data modeling, complex data structures, Data processing, Data quality, Data life cycle.
  • Experience in running Map Reduce and Spark jobs over YARN.
  • Expertise in interactive data visualization and analyzation with BI tools like Tableau.
  • Hands-on experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Participated in design reviews, code reviews, unit testing and integration testing.
  • Strong Experience on SQL, PL/SQL and the database concepts.
  • Experience on NoSQL Databases such as Hbase and Mongo DB.
  • A very good understanding of job workflow scheduling and monitoring tools like Oozie and Control M.
  • Knowledge on administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig

TECHNICAL SKILLS:

­­­­­­­­­­Hadoop/Big Data ecosystems: HDFS, Spark, Spark Streaming, Kafka, Flume, Hive, MapReduce, Impala, Sqoop, Oozie, Zookeeper.

No SQL Database: HBase, MongoDb

Tools: and IDE: Eclipse, Intellij Idea, Aqua Data Studio, Altova Mapforce, NetBeans, Maven, SBT.

Languages: C, C++, Java, J2EE, PL/SQL, MR, Pig Latin, HiveQL, Unix shell scripting,Perl scripting,Python and Scala

Databases and Datawarehousing: Teradata, Oracle, SQL server, MySQL, DB2, PostgreSQL

ETL tools: Data stage, Teradata

Operating Systems: Windows 95/98/2000/XP/Vista/7, Unix, Linux

PROFESSIONAL EXPERIENCE:

Confidential, Chicago, IL

Big Data Developer

Responsibilities:

  • Developed Spark core and Spark SQL/Streaming scripts using Scala for faster data processing.
  • Worked on HBase Bulk Loading using Spark.
  • Worked on Spark SQL for structured data processing.
  • Worked on loading the delta records into HBase table and used Apache Phoenix to pull and aggregate the data.
  • Used ETL process to Extract, Transform and Load the data into stage area and data warehouse.
  • Used Sqoop import/export jobs for transferring the data HDFS and Oracle.
  • Worked on creating the tables in Hive and loading the data into it.
  • Experienced in optimizing the Spark and Hive jobs.
  • Created configuration and parameter files for the reusable shell scripts.
  • Experience in Scheduling, Monitoring and supporting jobs using control M.
  • Generation of Surrogate Keys for the dimensions and fact tables for indexing and faster access of data in Data Warehouse.
  • Worked on creating Hive tables and query them using Hive QL.
  • Worked on the dimensional data model using Erwin Data Modeler (Star Schema).
  • Experience in working Change capture stage and Slowly Changing Dimension (SCD) stage.
  • Involved in Unit testing and Integration testing to test jobs and the flow.
  • Worked with ETL tool for importing metadata from repository, new job categories and creating new data elements.
  • Created context variable to store metadata information of all the sources and targets.
  • Experience in troubleshooting by tuning mappings, identify and resolve performance bottlenecks in various levels like source, target, mappings and session.
  • Created data flow diagrams, data mapping from Source to Stage and Stage to Target mapping documents indicating the source tables, columns, data types, transformations required and business rules to be applied.
  • Responsible for development and testing of conversion programs for importing data from text files into Oracle database utilizing Shell scripts and SQL-Loader.
  • Implemented the incremental loading of Dimension and Fact tables.
  • Involved in the daily maintenance of the database that involved monitoring the daily run of the scripts as well as troubleshooting in the event of any errors in the entire process.

Environment: MapR, Spark, Spark-Sql, Shell Script, HBase, Apache Phoenix, Scala, Python, Control-M, Hive, Sqoop, HDFS, Oracle, SQL Developer

Confidential, Dearborn, MI

Big Data Developer

Responsibilities:

  • Developed Spark core and Spark SQL/Streaming scripts using Scala for faster data processing.
  • Developed a data pipeline using Kafka, HBase, Spark and Hive to ingest, transform and analyzing customer behavioral data.
  • Worked in designing, implementing and processing the massive amount of market information, enrichment and processing.
  • Developed the data pipelines using Spark and Hive to ingest, transform and analyze the data.
  • Very good experience in programming using Scala and built Scala prototype for the application requirement and focused on types of functional programming.
  • Developed Spark Streaming jobs in Scala to consume data from Kafka Topics, made transformations on data and insert into HBase tables.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing the data using Map Reduce and Hive jobs.
  • Loaded and transformed large sets of structured, semi structured and unstructured data.
  • Managed and reviewed Hadoop log files to identify issues when job fails.
  • Worked on ingesting the real time data into HDFS using Flume.
  • Involved in writing the shell scripts in scheduling and automation of tasks.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Experience in using D-Streams, Accumulator, Broadcast variables, RDD caching for Spark Streaming.
  • Used Hue for UI based Hive and Impala query executions, Oozie scheduling and creating tables in Hive.
  • Involved in data ingestion into HDFS using Sqoop and into Hive tables using Perl script from different sources using the connectors such as jdbc and import parameters.
  • Worked on loading the CSV file in S3 bucket into the respective table into AWS Red Shift using Python script.
  • Created HBase tables to store variable data formats of data coming from different portfolios
  • Created AWS S3 buckets, performed folder management in each bucket and managed cloud trail logs and objects within each bucket.
  • Managing and scheduling the jobs in removing the duplicate log data files in HDFS using Oozie workflows.
  • Used Scala in writing the code for all the use cases in Spark and Spark SQL.
  • Used Git for version control and Jenkins for continuous integration and continuous deployment.
  • Strong knowledge in automating various production process using UNIX, Linux Shell scripting and Perl.

Environment: Hadoop, Cloudera, Spark, Spark SQL, Spark Streaming, HDFS, Hive, HBase, Kafka, FlumeAWS EC2, S3, Python, Scala, DataLake, Solar, Intellij Idea, Yarn, Oozie, Perl, Git, Jenkins.

Confidential, BaskinRidge, NJ

BigData Developer

Responsibilities:

  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Involved in importing and exporting data from HBase using Spark.
  • Involved in creating XML adapter for HBase using Spark API’s (HBase Integration).
  • Worked with different source data file formats like XML, JSONand CSV.
  • Hands on experience in the designing and development of ETL data flows using Hadoop and Spark ECO system components.
  • Experience in designing and implementing the work flow jobs using Talend & Unix/Linux scripting to perform ETL on Hadoop platform.
  • Worked on Spark SQL for structured data processing.
  • Involved in converting Hive queries into Spark transformations using Spark RDDs and Scala.
  • Utilized Built-in Maven repositories for MapR distributions.
  • Created Spark custom UDF's and UDAF’s to process business logic that varies based on requirements.
  • Developed UNIX shellscript that update the index in Elastic Search and automates the Spark submit command.
  • Experience in using the Spark application master to monitor the spark jobs and capture the logs for the spark jobs
  • Created tables in HBase to handle source XML data and analyzed data with Hive Queries by implementing Hive & HBase integration.
  • Experience in loading the Json data into Elastic Search using Spark and Scala API’s.
  • Used Git for version control and code reviews.
  • Worked with systems engineering team to plan and deploy new Hadoop environments through Jenkins’s.
  • Work with the System Analyst and development manager on a day-to-day basis
  • Work with service delivery (QA defects) team on transition and stabilization
  • Developed Spark API to import data into HBase from RDBMS using Sqoop
  • Effectively worked on Spark transformations and actions from the source xml file in converting to Json format and loading into ElasticSearch .
  • Filtered data from different data sources using AltovaMapforce and provided it as an input to Spark process.
  • Involved in Code reviews, Code tuning, performance tuning, and Unittesting the Spark Application.
  • Involved in creation and deletion of alias related to respective indices on Kibana and used Kibana in visualizing the indices that are loaded into Elastic Search.

Environment: Hadoop, MapR, Scala, Spark, Spark SQL, HDFS, Hive, HBase, DataLake, ElasticSearch, Intellij Idea, Yarn, Altova Mapforce, Altova XML Spy, Aqua Data Studio, Kibana, Splunk, Dynatrace, Sharepoint.

Confidential

Hadoop Developer

Responsibilities:

  • Developed transformations using custom Map Reduce and Hive
  • Developed Pig Latin scripts in extracting and filtering the relevant data from the web server output files to load into HDFS.
  • Created MapReduce jobs using Pig Latin and Hive Queries.
  • Worked on a Proof of Concept (POC) on Cloud era Impala. Our use case was to compare Impala and Hive. We also wanted to look at how Impala's Response time is better than Hive when it comes to large batch processing.
  • Performed Map side joins in both Pig and Hive
  • Build Spark Data frames to process huge amounts of structured data.
  • Used Sqoop in loading the data from RDBMS into HDFS.
  • Knowledge on handling hive queries using Spark SQL that integrates with Spark environment.
  • Optimized joins in Hive using techniques such as Sort-Merge join and Map side join
  • Used JSON to represent complex data structure within a map reduce job
  • Designed data models from scratch to aid team in data collection to be used in Tableau visuals.
  • Worked on streaming to collect the data from Flume and performed real time batch processing.
  • Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Reviewed and managed the Hadoop Log files.
  • By using Flume, log data is loaded into HDFS. Focused on creating the MapReduce jobs to power the data for search and aggregation.
  • Involved in creating Hive tables, loading the data and writing hive queries.
  • Performed POCs on Spark test environment
  • Developed Pig scripts and UDFs extensively for Value Added Processing (VAPs).
  • Actively involved in the design analysis, coding and strategy development.
  • Used python sub-process module to perform UNIX shell commands.
  • Developed Hive scripts for implementing dynamic partitions and buckets for retail history data.
  • Designed and developed read lock capability in HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.

Environment: Hadoop2x,Apache Spark, Spark Sql, Python, MapReduce, HDFS, Pig, Hive, HBase, Kafka Java, Oracle 10g,Tableau, MySQL, Cloudera Hadoop distribution.

Confidential

Hadoop/ETL Developer

Responsibilities:

  • Responsible for creating Databases, Tables, Indexes, Constraints, Views, Stored Procedures
  • Was involved in writing Triggers and Stored Procedures.
  • Actively involved in Normalization &De-normalization of database.
  • Was involved in performance tuning of the Database
  • Developed SQOOP commands to pull data from Teradata and push to HDFS.
  • Used Sqoop in importing the data and metadata from Oracle.
  • Involved in deploying the applications in AWS . Proficiency in Unix/Linux shell commands.
  • Developed custom reports using Microsoft reporting Services.
  • Involved in developing complex ETL transformation & performance tuning.
  • Developed Pig scripts to convert the data from Avro to text file format.
  • Involved in analysis of Report design requirements and actively participated and interacted with Business Analyst to understand the Business requirements.
  • Monitored the performance of SQL Server using SQL Server Profiler.
  • Created user filters and action filters. Created parameters and used them in the calculations
  • Created trace files on the queries and optimized the queries which were running slowly
  • Written complex SQL statements using joins, sub queries and correlated sub queries.
  • Created SSIS packages to extract data from different systems and Scheduled Jobs using SQL Agent to call the packages and Stored Procedures.

Environment: MapR, HDFS, Sqoop, Hive, Map Reduce, AWS, SQL Server 2005, Enterprise Manager, UML, DTS, Microsoft reporting Services, Agile, JIRA, Report Manger, SQL Agent

We'd love your feedback!