Big Data Developer Resume
Dallas, TX
SUMMARY:
- Extensive IT experience in Big Data technologies, Data Management/Analytics, Data visualization.
- Worked in various domains including E - commerce, Automotive and Manufacturing.
- Technical experience of using Hortonworks 2.6.5 distributions, Cloudera 4 and Hadoop working environment including Hadoop 2.8.3, Hive 2.1.1, Sqoop 1.99.7, Flume 1.7.0, HBase 2.0.0, Nifi 2.x, Apache Spark 2.2.1, Scala 2.12.0, Kafka 1.3.2
- Technically skilled at developing new applications on Hadoop according to business needs and converting existing applications to Hadoop environment
- Exposure in analyzing data using HiveQL, HBase 1.3.0 and Map Reduce programs in Java.
- Good understanding of workload management, schedulers, scalability and distributed platform architectures.
- Experience in Spark 2.2.1 programing with Scala and Python for high-volume data processing
- Experience in collecting, processing and aggregating large amounts of streaming data using Kafka 1.3.2, Spark Streaming
- In-Depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames.
- Experience in importing and exporting data using Sqoop 1.99.7 from HDFS to RDBMS and vice-versa
- Experience in building ETL pipelines using NIFI 2.x.
- Involved in creating HIVE tables, Partitioning, Bucketing, loading data and writing HIVE queries
- Experience in working with RDBMS including Oracle and MySQL 5.x
- Experience in developing scalable solutions using NoSQL databases including Cassandra 3.10, HBase 1.3.0
- Experience in working with AWS using the services like EC2/Kinesis/S3.
- Familiar with software development tools like Git and JIRA.
- Exposure to various software development methodologies like Agile and Waterfall.
- A good team-player, can work independently in a fast-paced multitasking environment, and a self-motivated learner
TECHNICAL SKILLS:
Cloud Technologies: Real Time Streaming Snowflake, AWS.\ Apache Storm, Apache Kafka 1.3.2\
Bigdata Technologies: Database Spark 2.1.0, Hive 2.1.1, Hdfs, MapReduce, \ HBase 1.3.0, Oracle 12c, SQL Server, MySQL Nifi 2.x, Sqoop 1.99.7, Flume 1.7.0, Oozie\ 5.x, Db2\
Hadoop Distributions: Programming Languages Cloudera 5.8.3, Hortonworks 2.5\ Scala 2.12.0, Python 3, Java 8, Shell scripting\
Dashboard: Operating System Elastic Search, Kibana, Ambari\ Windows 10, Centos 7.3, Mac OS 10.12.3\
Data Warehousing: IDEs Teradata, Snowflake\ Eclipse 4.6, Visual Studio 2016, IntelliJ.\
PROFESSIONAL EXPERIENCE:
Confidential
Big Data Developer
Responsibilities:
- Developed Scala scripts, UDF's using both Data frames and RDD in Spark 2.1.0 for Data Aggregation.
- Used Spark-SQL to create Schema RDD and loaded it into Hive Tables.
- Developed Spark 2.1.0 code using Scala and Spark-SQL for faster processing of data.
- Demonstrated better organization of the data using techniques like hive partitioning, bucketing
- Extracted data from MYSQL databases to HDFS using Apache Nifi 2.x.
- Optimizing Hive 2.0.x Queries, joins to get better results for Hive ad-hoc queries.
- Involved in creating Oozie 3.1.3 workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability.
- Involved in deploying the applications in AWS.
- Used Agile methodology for project management and Git for source code control.
Environment: Apache Spark 2.1.0 , Nifi 2.x, HDFS 2.6.1, Hive 2.0.x , Hadoop distribution of Cloudera 5.9, Linux, Eclipse, MySQL 5.x
Confidential - Dallas, TX
Big data developer
Responsibilities:
- Developed Spark 2.0 applications using RDDs, Data Frames to do data cleansing, data transformations, and data aggregations.
- Ex trac ted, tra nsfor med, a nd loade d ET L da ta f ro m m ult ipl e fe de ra ted da t a source s in Sp ar k 2.0 .
- Experience in In-memory computations with Spark RDDs for faster responses.
- Experience in handling large datasets using data partitioning, shared variables in Spark 2.0, effective & efficient Joins, and various data transformations.
- Experience in Performing tuning of Spark 2.0 applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Implemented Apache Nifi 1.7.x flow topologies to perform cleansing operations before moving data into HDFS
- Developed Spark Streaming applications to perform necessary operations real time and persists into HBase.
- Utilized Spark SQL with Data Frames API to provide efficiently structured data processing.
- Experience in Spark application submission over variety of cluster managers.
- Well versed in configuring Kafka 2.1.0 topics and scheduling Oozie workflows
Environmen t: Hadoop, Spark 2.0, Scala, Kafka 2.1.0, Hive, CDH 4.7.1, HBase, Nifi 1.7.x, Oozie, Linux, ETL
Confidential - Dallas, TX
Hadoop Developer
Responsibilities:
- Developed data pipeline using flume, Sqoop to extract the data from weblogs and store in HDFS.
- Used SQOOP 1.4.6 for importing and exporting data into HDFS and Hive.
- Involved in processing ingested raw data using MapReduce, Hive.
- Experience in moving processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or CopyToLocal.
- C oll ec ted a nd a gg re ga te d lar ge a mount s of da ta use d Ap ac h e F lu m e 1.6.0 a nd stage d da ta in HDF S for fur ther a na l y si s.
- Used Hue for Hive queries and created partitions according to day using Hive to improve performance
- Developed, validated and maintained HiveQL queries
- Implemented Partitions, bucketing concepts in Hive and designed both Managed and External tables for optimized performance
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Hive, and MapReduce) and move the data files within and outside of HDFS.
- Involved in using HCATALOG to access Hive table metadata from Map Reduce code.
- Supported Map Reduce Programs those are running on the cluster
- Wrote Hive queries for data analysis to meet the business requirements.
Environment: Hadoop (HDFS/MapReduce), Hive, SQOOP, Hue, SQL, Linux
Confidential
Hadoop Developer
Responsibilities:
- Developed workflow in SSIS to automate the tasks of loading the data into HDFS and processing using Hive.
- Moved Relational Database data using Sqoop into HDFS and Hive Dynamic partition tables using staging tables
- Stored data as parquet file format in Hive
- Performed analytics and drawn insights from the data using Hive
- Designed and Created Hive external tables using shared Meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Implemented SQOOP scripts to load data to Hive.
- Worked on data ingestion from Oracle to hive and involved in different data migration activities.
- Involved in fixing various issues related to data quality, data availability and data stability.
- Worked on Hue interface for Loading the data into HDFS and querying the data.
Environme nt: Hadoop, SQOOP, Hive, Oozie, SSIS, Linux
Confidential
Data Analyst
Responsibilities:
- Queried the data from RDBMS to csv files for each month for every service category
- Wrote SQL queries for data analysis and filtering out the required data for further processing.
- Performed SQL queries to extract data from Oracle SQL database.
- Performed initial descriptive data analysis and generate statistical reports.
- Developed regression algorithms to identify wire down incidents as to whether they are energized or non-energized and automated the detection procedure.
- Established an executive dashboard to demonstrate the project achievement and effectively communicated the results.
- Generated weekly reports to discuss with the fault rectifying teams.
Environment: Tableau, MySQL, Excel
Confidential
SQL Developer
Responsibilities:
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Used JDBC for database connectivity.
- Wrote SQL queries, stored procedures and database triggers on the database objects.
- Analyzed the data and created dashboards using Tableau.
- Used SQL queries, JDBC prepared statements for retrieving data from MySQL database.
- Actively participated and provided feedback constructively during daily Stand up meetings and weekly Iterative review meetings
Environment: Java 1.6, J2EE, Tableau, Eclipse, My SQL