We provide IT Staff Augmentation Services!

Big Data Developer Consultant Resume

3.00/5 (Submit Your Rating)

Charlotte North, CarolinA

PROFESSIONAL SUMMARY:

  • Over 8 years of professional IT experience which includes experience in Big data ecosystem and Data Science
  • Excellent Experience in Hadoop architecture and various components such as HDFS Job Tracker Task Tracker Namenode Data Node and MapReduce programming paradigm.
  • Analyzed and processed complex data sets using advanced querying, visualization and analytics tools.
  • Identified, measured and recommended improvement strategies for KPIs across all business areas
  • Experience using Talend Integration Suite (6.1/5.x) / Talend Open Studio (6.1/5.x)
  • Experience with Talend Admin Console (TAC).
  • Have sound exposure to Retail market including Retail Delivery System, Financial and medical Insurance environments.
  • Managed Amazon Redshift Cluster such as launching the cluster by specifying the nodes and performing data analysis queries.
  • Managed Kafka Hortonworks Cluster for streaming IOT data
  • Experience in using cloud components and connectors to make API calls for accessing data from cloud storage (Google Drive, Salesforce, Amazon S3, Drobox) in Talend Open Studio
  • Hands on experience in installing configuring and using Hadoop ecosystem components like Hadoop MapReduce HDFS HBase Hive Sqoop Pig Zookeeper and Flume.
  • Worked on ORC, Avro, Parquet formats and their codec compressions
  • Good Exposure on Apache Hadoop Map Reduce programming PIG Scripting and Distribute Application and HDFS.
  • Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • In - depth understanding of Data Structure and Algorithms.
  • Involved Performance Tuning Techniques for Hive and Impala Table’s.
  • Experience in writing python scripts for data ingestion to Hadoop from various sources.
  • Excellent understanding and knowledge of NOSQL databases like MongoDB HBase Cassandra.
  • Experience in Airflow platform monitor and retrofit data pipelines.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Experience in managing Hadoop clusters using Cloudera Manager tool.
  • Experience In working with Mapr MCS Tool in Hadoop Distribution
  • Very good experience in complete project life cycle design development testing and implementation of Client Server and Web applications.
  • Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine-tuning of Linux Red hat.
  • Extensive experience working in Oracle DB2 SQL Server and My SQL database.
  • Hands on experience in VPN Putty, WinSCP, Cyber Duck, FileZilla etc.
  • Scripting to deploy monitors checks and critical system admin functions automation.
  • Hands on experience in application development using Java RDBMS and Linux shell scripting.

SKIILS:

Hadoop Ecosystem Development: Sqoop, MapReduce, Hive, Pig, Flume, Oozie, Zookeeper, HBase, Spark, Storm, Kafka, Drill, HDFS

Data science Tools: Rstudio, Zeepline, Jupyter

ETL Tools: Talend, Datameer

Operating System: Linux, Windows XP, Server 2003, Server 2008, Redhat, Ubuntu

Databases: MySQL, Oracle, MS SQL Server, DB2, MS Access, HBase, MongoDB

Languages: Python, Sql, Pig, UNIX, Shell Scripting

Cloud: AWS, RedShift, EMR, EC2, S3, VPC, VPN, Load Balancer, Azure and HDInsights

PROFESSIONAL EXPERIENCE:

Confidential, Charlotte, North Carolina

Big Data Developer Consultant

Responsibilities:

  • Launched 5 Node Mapr Cluster 5.2 with 80 Cores,600G Ram on AWS, to implement a data recommendation engine.
  • Installed Spark, Hive, Oozie Rstudio, Zeepline on to Mapr Hadoop Cluster.
  • Mentored sophisticated organizations on large scale data and analytics using advanced statistical and machine learning models.
  • Architected and implemented analytics and visualization components for device data analysis platform to predict hardware
  • Wrote Python and Scala api from Client’s Rdbms to aws s3 and implemented on Spark Execution framework.
  • Worked Intensively on Spark Shell, Pyspark.
  • Experience in performance tuning and query optimization in AWS Redshift
  • Performed data manipulations using various Talend components like tMap, tJavarow, tjava, tOracleRow, tOracleInput, tOracleOutput, tMSSQLInput and many more
  • Worked on Talend RTX ETL tool, develop jobs and scheduled jobs in Talend integration suite.
  • Responsible to tune ETL mappings, Workflows and underlying data model to optimize load and query Performance
  • Created Spark Data Frames while loading the data in to Hive Tables
  • Used Avro and Parquet Formats with Codec Compressions in Hive and Impala
  • Used Kafka Cluster to load IOT data on the Hadoop Cluster and designed a data lake for IOT on Kafka Cluster on Azure
  • Created a VPG connection From Client Network to Hadoop cluster on top of aws.
  • Used Sqoop & built the data lake From MSSQL TO Hadoop cluster & scheduled jobs using crontab & Oozie
  • Built a data pipeline from FTP Server to Hadoop using python api.
  • Connected Hive tables to Rstudio and zeepline through Sparkly & JDBC Connection & provided data Infrastructure access to data science team to run their data models on aws cloud on their notebooks
  • Connected Hive tables on Hadoop cluster to visualization tools like PowerBI & Tableau using ODBC and mapr drill connector.
  • Used s3 as data backup from the hadoop cluster.
  • Implemented POC on Launching Hadoop and Spark Cluster on Hdinsights.
  • Developed Spark for the recommendation engine and validated using python scripts.

Environment: AWS, Spark, Python, Azure, Jupyter, Zeepline, Talend Open Studio (TOS), Talend 6.1/5.6, RedShift, Hadoop, Sqoop, Scala Hive, Flume, HBase, Kafka, PIG, Java Shell Scripting, Unix, MySQL, MS Sql, Ubuntu, Zookeeper

Confidential, Jacksonville, Florida

Hadoop Developer

Responsibilities:

  • Created Hive, HBase tables using ORC file format and Snappy compression.
  • Integrated Apache Kafka for data ingestion
  • Set up and installed a 20 node HDP Cluster, the cluster was used to run their matching algorithm. Migrated 100 Tb Data. Migrated all spark and hive jobs
  • Wrote different Pig scripts to clean up the ingested data and created partitions for the daily data.
  • Conducted and moderated a Meet up for 60 Big Data enthusiast from the medical equipment there by educating them on HDP and Hadoop
  • Involved in R & D Project to develop Business Intelligence/Reporting tool for Florida Blue
  • Implemented Partitioning and bucketing for Hive tables based on the requirement.
  • Performed Various tasks on data Ingestion from DB2 to the HDFS using Sqoop JDBC Connector
  • Created shell scripts to parameterize the Pig, Hive actions in Oozie workflow.
  • Designed and implemented Incremental Imports into Hive tables.
  • Created different formats of Hive Tables Finally worked on ORC & Avro and picked ORC as fit requirements.
  • Managed and reviewed the Hadoop log files.
  • Provisioning and managing multi-tenant Cassandra cluster on public cloud environment.
  • Created Hive External & Managed Tables & sorted out the best performance and voted to External Tables.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Pig scripts.
  • Worked on aggregating the data using advanced aspects of Map-Reduce.

Environment: Kafka, Hortonworks, Hadoop, MapReduce, HDFS, Hive, Hadoop distribution of Horton Works Cloudera, Pig, HBase, Linux, XML, MySQL, Java 6 Eclipse, Oracle 10g, PL/SQL SQL PLUS, Python

Confidential, Mclean, Virginia

Hadoop Developer

Responsibilities:

  • Imported data from Teradata systems to AWS S3 using Data Transfer and MapReduce.
  • Experience on Hadoop data ingestion using ETL tools Data stage and Hadoop transformation.
  • Worked on Inbuilt Quantum Application where we used to run our workflow with spark application.
  • Experienced with Linux operating system and shell scripting.
  • Worked on Apache Spark for real time and batch processing.
  • Developed Map Reduce code using Java and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Kibana Elastic Search for handling log messages that are handled by multiple systems.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Implemented Mongo Db and set up Mongo Components to Write Data to Mongo and S3 simultaneously.
  • Worked on GITHUB for Version Control Tool.
  • Developed scalable modular software packages for various APIs and applications.
  • Developed a data pipeline using Kafka and Strom to store data into HDFS.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.

Environment: Cloudera, Hadoop, MapReduce, HDFS, Hive, Python, Sqoop, HBase, Pig, Oozie, Storm, Kerberos, Java, Linux, Shell Scripting

Confidential

Hadoop Developer

Responsibilities:

  • Worked on a live 16 nodes Hadoop cluster running CDH4.4
  • Performed Flume & Sqoop imports of data from Data warehouse platform to HDFS.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Implemented Data classification algorithms using Map reduce design patterns.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala with Cloudera distribution.
  • Used Pig to do data transformations, event joins, filter and some pre-aggregations before storing the data into HDFS.
  • Supported MapReduce Programs those are running on the cluster.
  • Extracted the data from Teradata into HDFS using the Sqoop.

Environment: Cloudera, Hadoop, MapReduce, HDFS, Hive, Sqoop, HBase, Linux, XML, MySQL, Java 6 Eclipse

Confidential

Software Engineer

Responsibilities:

  • Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
  • Collected and aggregated of web log data from different sources such as web servers, mobile using Apache Flume and stored the data into HDFS/HBase for analysis.
  • Developed an automated process using Shell script which drives the data Ingestion.

Environment: Hadoop, HDFS, Hive, Flume, HBase, Sqoop, PIG

We'd love your feedback!